Re: Oops running iptables -F OUTPUT

2018-08-28 Thread Nicholas Piggin
On Tue, 28 Aug 2018 18:09:09 +0200
Ard Biesheuvel  wrote:

> On 28 August 2018 at 15:56, Ard Biesheuvel  wrote:
> > Hello Andreas, Nick,
> >
> > On 28 August 2018 at 06:06, Nicholas Piggin  
> > wrote:  
> >> On Mon, 27 Aug 2018 19:11:01 +0200
> >> Andreas Schwab  wrote:
> >>  
> >>> I'm getting this Oops when running iptables -F OUTPUT:
> >>>
> >>> [   91.139409] Unable to handle kernel paging request for data at address 
> >>> 0xd001fff12f34
> >>> [   91.139414] Faulting instruction address: 0xd16a5718
> >>> [   91.139419] Oops: Kernel access of bad area, sig: 11 [#1]
> >>> [   91.139426] BE SMP NR_CPUS=2 PowerMac
> >>> [   91.139434] Modules linked in: iptable_filter ip_tables x_tables 
> >>> bpfilter nfsd auth_rpcgss lockd grace nfs_acl sunrpc tun af_packet 
> >>> snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa snd_aoa_i2sbus 
> >>> snd_aoa_soundbus snd_pcm_oss snd_pcm snd_seq snd_timer snd_seq_device 
> >>> snd_mixer_oss snd sungem sr_mod firewire_ohci cdrom sungem_phy soundcore 
> >>> firewire_core pata_macio crc_itu_t sg hid_generic usbhid linear md_mod 
> >>> ohci_pci ohci_hcd ehci_pci ehci_hcd usbcore usb_common dm_snapshot 
> >>> dm_bufio dm_mirror dm_region_hash dm_log dm_mod sata_svw
> >>> [   91.139522] CPU: 1 PID: 3620 Comm: iptables Not tainted 4.19.0-rc1 #1
> >>> [   91.139526] NIP:  d16a5718 LR: d16a569c CTR: 
> >>> c06f560c
> >>> [   91.139531] REGS: c001fa577670 TRAP: 0300   Not tainted  
> >>> (4.19.0-rc1)
> >>> [   91.139534] MSR:  9200b032   CR: 
> >>> 84002484  XER: 2000
> >>> [   91.139553] DAR: d001fff12f34 DSISR: 4000 IRQMASK: 0
> >>> GPR00: d16a569c c001fa5778f0 d16b0400 
> >>> GPR04: 0002  8001fa46418e c001fa0d05c8
> >>> GPR08: d16b0400 d00037f13000 0001ff3e7000 d16a6fb8
> >>> GPR12: c06f560c c780  
> >>> GPR16: 11635010 3fffa1b7aa68  
> >>> GPR20: 0003 10013918 116350c0 c0b88990
> >>> GPR24: c0b88ba4  d001fff12f34 
> >>> GPR28: d16b8000 c001fa20f400 c001fa20f440 
> >>> [   91.139627] NIP [d16a5718] .alloc_counters.isra.10+0xbc/0x140 
> >>> [ip_tables]
> >>> [   91.139634] LR [d16a569c] .alloc_counters.isra.10+0x40/0x140 
> >>> [ip_tables]
> >>> [   91.139638] Call Trace:
> >>> [   91.139645] [c001fa5778f0] [d16a569c] 
> >>> .alloc_counters.isra.10+0x40/0x140 [ip_tables] (unreliable)
> >>> [   91.139655] [c001fa5779b0] [d16a5b54] 
> >>> .do_ipt_get_ctl+0x110/0x2ec [ip_tables]
> >>> [   91.139666] [c001fa577aa0] [c06233e0] 
> >>> .nf_getsockopt+0x68/0x88
> >>> [   91.139674] [c001fa577b40] [c0631608] 
> >>> .ip_getsockopt+0xbc/0x128
> >>> [   91.139682] [c001fa577bf0] [c065adf4] 
> >>> .raw_getsockopt+0x18/0x5c
> >>> [   91.139690] [c001fa577c60] [c05b5f60] 
> >>> .sock_common_getsockopt+0x2c/0x40
> >>> [   91.139697] [c001fa577cd0] [c05b3394] 
> >>> .__sys_getsockopt+0xa4/0xd0
> >>> [   91.139704] [c001fa577d80] [c05b5ab0] 
> >>> .__se_sys_socketcall+0x238/0x2b4
> >>> [   91.139712] [c001fa577e30] [c000a31c] system_call+0x5c/0x70
> >>> [   91.139716] Instruction dump:
> >>> [   91.139721] 39290040 7d3d4a14 7fbe4840 409cff98 8138 2b890001 
> >>> 419d000c 393e0060
> >>> [   91.139736] 4810 7d57c82a e93e0060 7d295214 <815a> 794807e1 
> >>> 41e20010 7c210b78
> >>> [   91.139752] ---[ end trace f5d1d5431651845d ]---  
> >>
> >> This is due to 7290d58095 ("module: use relative references for
> >> __ksymtab entries"). This part of kernel/module.c -
> >>
> >>/* Divert to percpu allocation if a percpu var. */
> >>if (sym[i].st_shndx == info->index.pcpu)
> >>secbase = (unsigned long)mod_percpu(mod);
> >>else
> >>secbase = info->sechdrs[sym[i].st_shndx].sh_addr;
> >>sym[i].st_value += secbase;
> >>
> >> Causes the distance to the target to exceed 32-bits on powerpc, so
> >> it doesn't fit in a rel32 reloc. Not sure how other archs cope.
> >>  
> >
> > Apologies for the breakage. It does indeed appear to affect all
> > architectures, and I'm a bit puzzled why you are the first one to spot
> > it.
> >
> > I will try to find a clean way to special case the per-CPU variable
> > __ksymtab references in the generic module code, and if that is too
> > cumbersome, we can switch to 64-bit relative references (or rather,
> > native word size relative references) instead. Or revert the whole
> > thing ...  
> 
> OK, after a bit of digging, and confirming that the arm64
> implementation works as expected (its module loader actually detects
> overflows of the 32-bit place relative relocations, so the problem
> definitely does not occur there), I think I found the explanation why

Re: Oops running iptables -F OUTPUT

2018-08-28 Thread Nicholas Piggin
On Wed, 29 Aug 2018 13:28:27 +1000
Nicholas Piggin  wrote:

> On Tue, 28 Aug 2018 14:06:32 +1000
> Nicholas Piggin  wrote:
> 
> > On Mon, 27 Aug 2018 19:11:01 +0200
> > Andreas Schwab  wrote:
> >   
> > > I'm getting this Oops when running iptables -F OUTPUT:
> > > 
> > > [   91.139409] Unable to handle kernel paging request for data at address 
> > > 0xd001fff12f34
> > > [   91.139414] Faulting instruction address: 0xd16a5718
> > > [   91.139419] Oops: Kernel access of bad area, sig: 11 [#1]
> > > [   91.139426] BE SMP NR_CPUS=2 PowerMac
> > > [   91.139434] Modules linked in: iptable_filter ip_tables x_tables 
> > > bpfilter nfsd auth_rpcgss lockd grace nfs_acl sunrpc tun af_packet 
> > > snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa snd_aoa_i2sbus 
> > > snd_aoa_soundbus snd_pcm_oss snd_pcm snd_seq snd_timer snd_seq_device 
> > > snd_mixer_oss snd sungem sr_mod firewire_ohci cdrom sungem_phy soundcore 
> > > firewire_core pata_macio crc_itu_t sg hid_generic usbhid linear md_mod 
> > > ohci_pci ohci_hcd ehci_pci ehci_hcd usbcore usb_common dm_snapshot 
> > > dm_bufio dm_mirror dm_region_hash dm_log dm_mod sata_svw
> > > [   91.139522] CPU: 1 PID: 3620 Comm: iptables Not tainted 4.19.0-rc1 #1
> > > [   91.139526] NIP:  d16a5718 LR: d16a569c CTR: 
> > > c06f560c
> > > [   91.139531] REGS: c001fa577670 TRAP: 0300   Not tainted  
> > > (4.19.0-rc1)
> > > [   91.139534] MSR:  9200b032   CR: 
> > > 84002484  XER: 2000
> > > [   91.139553] DAR: d001fff12f34 DSISR: 4000 IRQMASK: 0 
> > > GPR00: d16a569c c001fa5778f0 d16b0400 
> > >  
> > > GPR04: 0002  8001fa46418e 
> > > c001fa0d05c8 
> > > GPR08: d16b0400 d00037f13000 0001ff3e7000 
> > > d16a6fb8 
> > > GPR12: c06f560c c780  
> > >  
> > > GPR16: 11635010 3fffa1b7aa68  
> > >  
> > > GPR20: 0003 10013918 116350c0 
> > > c0b88990 
> > > GPR24: c0b88ba4  d001fff12f34 
> > >  
> > > GPR28: d16b8000 c001fa20f400 c001fa20f440 
> > >  
> > > [   91.139627] NIP [d16a5718] .alloc_counters.isra.10+0xbc/0x140 
> > > [ip_tables]
> > > [   91.139634] LR [d16a569c] .alloc_counters.isra.10+0x40/0x140 
> > > [ip_tables]
> > > [   91.139638] Call Trace:
> > > [   91.139645] [c001fa5778f0] [d16a569c] 
> > > .alloc_counters.isra.10+0x40/0x140 [ip_tables] (unreliable)
> > > [   91.139655] [c001fa5779b0] [d16a5b54] 
> > > .do_ipt_get_ctl+0x110/0x2ec [ip_tables]
> > > [   91.139666] [c001fa577aa0] [c06233e0] 
> > > .nf_getsockopt+0x68/0x88
> > > [   91.139674] [c001fa577b40] [c0631608] 
> > > .ip_getsockopt+0xbc/0x128
> > > [   91.139682] [c001fa577bf0] [c065adf4] 
> > > .raw_getsockopt+0x18/0x5c
> > > [   91.139690] [c001fa577c60] [c05b5f60] 
> > > .sock_common_getsockopt+0x2c/0x40
> > > [   91.139697] [c001fa577cd0] [c05b3394] 
> > > .__sys_getsockopt+0xa4/0xd0
> > > [   91.139704] [c001fa577d80] [c05b5ab0] 
> > > .__se_sys_socketcall+0x238/0x2b4
> > > [   91.139712] [c001fa577e30] [c000a31c] system_call+0x5c/0x70
> > > [   91.139716] Instruction dump:
> > > [   91.139721] 39290040 7d3d4a14 7fbe4840 409cff98 8138 2b890001 
> > > 419d000c 393e0060 
> > > [   91.139736] 4810 7d57c82a e93e0060 7d295214 <815a> 794807e1 
> > > 41e20010 7c210b78 
> > > [   91.139752] ---[ end trace f5d1d5431651845d ]---
> > 
> > This is due to 7290d58095 ("module: use relative references for
> > __ksymtab entries"). This part of kernel/module.c -
> > 
> >/* Divert to percpu allocation if a percpu var. */
> >if (sym[i].st_shndx == info->index.pcpu)
> >secbase = (unsigned long)mod_percpu(mod);
> >else
> >secbase = info->sechdrs[sym[i].st_shndx].sh_addr;
> >sym[i].st_value += secbase;
> > 
> > Causes the distance to the target to exceed 32-bits on powerpc, so
> > it doesn't fit in a rel32 reloc. Not sure how other archs cope.  
> 
> Any progress on this one? I had a bit of a look but can't see a really
> trivial fix and don't have a lot of time to work on it. Maybe use 64
> bit relative offsets for per-cpu exports, or better might be apply the
> per-cpu fixup when linking against the symbol rather than when writing
> the module symbol table.
> 
> Until then I'd like to just remove HAVE_ARCH_PREL32_RELOCATIONS from
> powerpc/Kconfig, but if other archs are going to have issues too, we
> could just revert
> 
> 271ca788774aa ("arch: enable relative relocations for arm64, power and x86")
> 
> arm64, x86 -- can the distance between your module percpu data link
> location -> module percpu runtime allocation location exceed 31 bits?

[Sorry ignore this, I missed some mail, will 

Re: Oops running iptables -F OUTPUT

2018-08-28 Thread Nicholas Piggin
On Tue, 28 Aug 2018 14:06:32 +1000
Nicholas Piggin  wrote:

> On Mon, 27 Aug 2018 19:11:01 +0200
> Andreas Schwab  wrote:
> 
> > I'm getting this Oops when running iptables -F OUTPUT:
> > 
> > [   91.139409] Unable to handle kernel paging request for data at address 
> > 0xd001fff12f34
> > [   91.139414] Faulting instruction address: 0xd16a5718
> > [   91.139419] Oops: Kernel access of bad area, sig: 11 [#1]
> > [   91.139426] BE SMP NR_CPUS=2 PowerMac
> > [   91.139434] Modules linked in: iptable_filter ip_tables x_tables 
> > bpfilter nfsd auth_rpcgss lockd grace nfs_acl sunrpc tun af_packet 
> > snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa snd_aoa_i2sbus 
> > snd_aoa_soundbus snd_pcm_oss snd_pcm snd_seq snd_timer snd_seq_device 
> > snd_mixer_oss snd sungem sr_mod firewire_ohci cdrom sungem_phy soundcore 
> > firewire_core pata_macio crc_itu_t sg hid_generic usbhid linear md_mod 
> > ohci_pci ohci_hcd ehci_pci ehci_hcd usbcore usb_common dm_snapshot dm_bufio 
> > dm_mirror dm_region_hash dm_log dm_mod sata_svw
> > [   91.139522] CPU: 1 PID: 3620 Comm: iptables Not tainted 4.19.0-rc1 #1
> > [   91.139526] NIP:  d16a5718 LR: d16a569c CTR: 
> > c06f560c
> > [   91.139531] REGS: c001fa577670 TRAP: 0300   Not tainted  (4.19.0-rc1)
> > [   91.139534] MSR:  9200b032   CR: 
> > 84002484  XER: 2000
> > [   91.139553] DAR: d001fff12f34 DSISR: 4000 IRQMASK: 0 
> > GPR00: d16a569c c001fa5778f0 d16b0400  
> > GPR04: 0002  8001fa46418e c001fa0d05c8 
> > GPR08: d16b0400 d00037f13000 0001ff3e7000 d16a6fb8 
> > GPR12: c06f560c c780   
> > GPR16: 11635010 3fffa1b7aa68   
> > GPR20: 0003 10013918 116350c0 c0b88990 
> > GPR24: c0b88ba4  d001fff12f34  
> > GPR28: d16b8000 c001fa20f400 c001fa20f440  
> > [   91.139627] NIP [d16a5718] .alloc_counters.isra.10+0xbc/0x140 
> > [ip_tables]
> > [   91.139634] LR [d16a569c] .alloc_counters.isra.10+0x40/0x140 
> > [ip_tables]
> > [   91.139638] Call Trace:
> > [   91.139645] [c001fa5778f0] [d16a569c] 
> > .alloc_counters.isra.10+0x40/0x140 [ip_tables] (unreliable)
> > [   91.139655] [c001fa5779b0] [d16a5b54] 
> > .do_ipt_get_ctl+0x110/0x2ec [ip_tables]
> > [   91.139666] [c001fa577aa0] [c06233e0] 
> > .nf_getsockopt+0x68/0x88
> > [   91.139674] [c001fa577b40] [c0631608] 
> > .ip_getsockopt+0xbc/0x128
> > [   91.139682] [c001fa577bf0] [c065adf4] 
> > .raw_getsockopt+0x18/0x5c
> > [   91.139690] [c001fa577c60] [c05b5f60] 
> > .sock_common_getsockopt+0x2c/0x40
> > [   91.139697] [c001fa577cd0] [c05b3394] 
> > .__sys_getsockopt+0xa4/0xd0
> > [   91.139704] [c001fa577d80] [c05b5ab0] 
> > .__se_sys_socketcall+0x238/0x2b4
> > [   91.139712] [c001fa577e30] [c000a31c] system_call+0x5c/0x70
> > [   91.139716] Instruction dump:
> > [   91.139721] 39290040 7d3d4a14 7fbe4840 409cff98 8138 2b890001 
> > 419d000c 393e0060 
> > [   91.139736] 4810 7d57c82a e93e0060 7d295214 <815a> 794807e1 
> > 41e20010 7c210b78 
> > [   91.139752] ---[ end trace f5d1d5431651845d ]---  
> 
> This is due to 7290d58095 ("module: use relative references for
> __ksymtab entries"). This part of kernel/module.c -
> 
>/* Divert to percpu allocation if a percpu var. */
>if (sym[i].st_shndx == info->index.pcpu)
>secbase = (unsigned long)mod_percpu(mod);
>else
>secbase = info->sechdrs[sym[i].st_shndx].sh_addr;
>sym[i].st_value += secbase;
> 
> Causes the distance to the target to exceed 32-bits on powerpc, so
> it doesn't fit in a rel32 reloc. Not sure how other archs cope.

Any progress on this one? I had a bit of a look but can't see a really
trivial fix and don't have a lot of time to work on it. Maybe use 64
bit relative offsets for per-cpu exports, or better might be apply the
per-cpu fixup when linking against the symbol rather than when writing
the module symbol table.

Until then I'd like to just remove HAVE_ARCH_PREL32_RELOCATIONS from
powerpc/Kconfig, but if other archs are going to have issues too, we
could just revert

271ca788774aa ("arch: enable relative relocations for arm64, power and x86")

arm64, x86 -- can the distance between your module percpu data link
location -> module percpu runtime allocation location exceed 31 bits?

Thanks,
Nick


Re: Oops running iptables -F OUTPUT

2018-08-28 Thread Andreas Schwab
On Aug 28 2018, Ard Biesheuvel  wrote:

> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index 6a501b25dd85..57d09d5ceb1a 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -779,7 +779,6 @@ EXPORT_SYMBOL(__per_cpu_offset);
>
>  void __init setup_per_cpu_areas(void)
>  {
> -   const size_t dyn_size = PERCPU_MODULE_RESERVE + 
> PERCPU_DYNAMIC_RESERVE;
> size_t atom_size;
> unsigned long delta;
> unsigned int cpu;
> @@ -795,7 +794,9 @@ void __init setup_per_cpu_areas(void)
> else
> atom_size = 1 << 20;
>
> -   rc = pcpu_embed_first_chunk(0, dyn_size, atom_size, pcpu_cpu_distance,
> +   rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
> +   PERCPU_DYNAMIC_RESERVE,
> +   atom_size, pcpu_cpu_distance,
> pcpu_fc_alloc, pcpu_fc_free);
> if (rc < 0)
> panic("cannot initialize percpu area (err=%d)", rc);

That didn't help.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: Oops running iptables -F OUTPUT

2018-08-28 Thread Ard Biesheuvel
On 28 August 2018 at 15:56, Ard Biesheuvel  wrote:
> Hello Andreas, Nick,
>
> On 28 August 2018 at 06:06, Nicholas Piggin  wrote:
>> On Mon, 27 Aug 2018 19:11:01 +0200
>> Andreas Schwab  wrote:
>>
>>> I'm getting this Oops when running iptables -F OUTPUT:
>>>
>>> [   91.139409] Unable to handle kernel paging request for data at address 
>>> 0xd001fff12f34
>>> [   91.139414] Faulting instruction address: 0xd16a5718
>>> [   91.139419] Oops: Kernel access of bad area, sig: 11 [#1]
>>> [   91.139426] BE SMP NR_CPUS=2 PowerMac
>>> [   91.139434] Modules linked in: iptable_filter ip_tables x_tables 
>>> bpfilter nfsd auth_rpcgss lockd grace nfs_acl sunrpc tun af_packet 
>>> snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa snd_aoa_i2sbus 
>>> snd_aoa_soundbus snd_pcm_oss snd_pcm snd_seq snd_timer snd_seq_device 
>>> snd_mixer_oss snd sungem sr_mod firewire_ohci cdrom sungem_phy soundcore 
>>> firewire_core pata_macio crc_itu_t sg hid_generic usbhid linear md_mod 
>>> ohci_pci ohci_hcd ehci_pci ehci_hcd usbcore usb_common dm_snapshot dm_bufio 
>>> dm_mirror dm_region_hash dm_log dm_mod sata_svw
>>> [   91.139522] CPU: 1 PID: 3620 Comm: iptables Not tainted 4.19.0-rc1 #1
>>> [   91.139526] NIP:  d16a5718 LR: d16a569c CTR: 
>>> c06f560c
>>> [   91.139531] REGS: c001fa577670 TRAP: 0300   Not tainted  (4.19.0-rc1)
>>> [   91.139534] MSR:  9200b032   CR: 
>>> 84002484  XER: 2000
>>> [   91.139553] DAR: d001fff12f34 DSISR: 4000 IRQMASK: 0
>>> GPR00: d16a569c c001fa5778f0 d16b0400 
>>> GPR04: 0002  8001fa46418e c001fa0d05c8
>>> GPR08: d16b0400 d00037f13000 0001ff3e7000 d16a6fb8
>>> GPR12: c06f560c c780  
>>> GPR16: 11635010 3fffa1b7aa68  
>>> GPR20: 0003 10013918 116350c0 c0b88990
>>> GPR24: c0b88ba4  d001fff12f34 
>>> GPR28: d16b8000 c001fa20f400 c001fa20f440 
>>> [   91.139627] NIP [d16a5718] .alloc_counters.isra.10+0xbc/0x140 
>>> [ip_tables]
>>> [   91.139634] LR [d16a569c] .alloc_counters.isra.10+0x40/0x140 
>>> [ip_tables]
>>> [   91.139638] Call Trace:
>>> [   91.139645] [c001fa5778f0] [d16a569c] 
>>> .alloc_counters.isra.10+0x40/0x140 [ip_tables] (unreliable)
>>> [   91.139655] [c001fa5779b0] [d16a5b54] 
>>> .do_ipt_get_ctl+0x110/0x2ec [ip_tables]
>>> [   91.139666] [c001fa577aa0] [c06233e0] 
>>> .nf_getsockopt+0x68/0x88
>>> [   91.139674] [c001fa577b40] [c0631608] 
>>> .ip_getsockopt+0xbc/0x128
>>> [   91.139682] [c001fa577bf0] [c065adf4] 
>>> .raw_getsockopt+0x18/0x5c
>>> [   91.139690] [c001fa577c60] [c05b5f60] 
>>> .sock_common_getsockopt+0x2c/0x40
>>> [   91.139697] [c001fa577cd0] [c05b3394] 
>>> .__sys_getsockopt+0xa4/0xd0
>>> [   91.139704] [c001fa577d80] [c05b5ab0] 
>>> .__se_sys_socketcall+0x238/0x2b4
>>> [   91.139712] [c001fa577e30] [c000a31c] system_call+0x5c/0x70
>>> [   91.139716] Instruction dump:
>>> [   91.139721] 39290040 7d3d4a14 7fbe4840 409cff98 8138 2b890001 
>>> 419d000c 393e0060
>>> [   91.139736] 4810 7d57c82a e93e0060 7d295214 <815a> 794807e1 
>>> 41e20010 7c210b78
>>> [   91.139752] ---[ end trace f5d1d5431651845d ]---
>>
>> This is due to 7290d58095 ("module: use relative references for
>> __ksymtab entries"). This part of kernel/module.c -
>>
>>/* Divert to percpu allocation if a percpu var. */
>>if (sym[i].st_shndx == info->index.pcpu)
>>secbase = (unsigned long)mod_percpu(mod);
>>else
>>secbase = info->sechdrs[sym[i].st_shndx].sh_addr;
>>sym[i].st_value += secbase;
>>
>> Causes the distance to the target to exceed 32-bits on powerpc, so
>> it doesn't fit in a rel32 reloc. Not sure how other archs cope.
>>
>
> Apologies for the breakage. It does indeed appear to affect all
> architectures, and I'm a bit puzzled why you are the first one to spot
> it.
>
> I will try to find a clean way to special case the per-CPU variable
> __ksymtab references in the generic module code, and if that is too
> cumbersome, we can switch to 64-bit relative references (or rather,
> native word size relative references) instead. Or revert the whole
> thing ...

OK, after a bit of digging, and confirming that the arm64
implementation works as expected (its module loader actually detects
overflows of the 32-bit place relative relocations, so the problem
definitely does not occur there), I think I found the explanation why
this occurs on powerpc and not on x86 or arm64.

Could you please check whether this change makes the issue go away?
(whitespace damage courtesy of Gmail)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 6a501b25dd85..57d09d5ceb1a 

Re: Oops running iptables -F OUTPUT

2018-08-28 Thread Ard Biesheuvel
Hello Andreas, Nick,

On 28 August 2018 at 06:06, Nicholas Piggin  wrote:
> On Mon, 27 Aug 2018 19:11:01 +0200
> Andreas Schwab  wrote:
>
>> I'm getting this Oops when running iptables -F OUTPUT:
>>
>> [   91.139409] Unable to handle kernel paging request for data at address 
>> 0xd001fff12f34
>> [   91.139414] Faulting instruction address: 0xd16a5718
>> [   91.139419] Oops: Kernel access of bad area, sig: 11 [#1]
>> [   91.139426] BE SMP NR_CPUS=2 PowerMac
>> [   91.139434] Modules linked in: iptable_filter ip_tables x_tables bpfilter 
>> nfsd auth_rpcgss lockd grace nfs_acl sunrpc tun af_packet snd_aoa_codec_tas 
>> snd_aoa_fabric_layout snd_aoa snd_aoa_i2sbus snd_aoa_soundbus snd_pcm_oss 
>> snd_pcm snd_seq snd_timer snd_seq_device snd_mixer_oss snd sungem sr_mod 
>> firewire_ohci cdrom sungem_phy soundcore firewire_core pata_macio crc_itu_t 
>> sg hid_generic usbhid linear md_mod ohci_pci ohci_hcd ehci_pci ehci_hcd 
>> usbcore usb_common dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log 
>> dm_mod sata_svw
>> [   91.139522] CPU: 1 PID: 3620 Comm: iptables Not tainted 4.19.0-rc1 #1
>> [   91.139526] NIP:  d16a5718 LR: d16a569c CTR: 
>> c06f560c
>> [   91.139531] REGS: c001fa577670 TRAP: 0300   Not tainted  (4.19.0-rc1)
>> [   91.139534] MSR:  9200b032   CR: 
>> 84002484  XER: 2000
>> [   91.139553] DAR: d001fff12f34 DSISR: 4000 IRQMASK: 0
>> GPR00: d16a569c c001fa5778f0 d16b0400 
>> GPR04: 0002  8001fa46418e c001fa0d05c8
>> GPR08: d16b0400 d00037f13000 0001ff3e7000 d16a6fb8
>> GPR12: c06f560c c780  
>> GPR16: 11635010 3fffa1b7aa68  
>> GPR20: 0003 10013918 116350c0 c0b88990
>> GPR24: c0b88ba4  d001fff12f34 
>> GPR28: d16b8000 c001fa20f400 c001fa20f440 
>> [   91.139627] NIP [d16a5718] .alloc_counters.isra.10+0xbc/0x140 
>> [ip_tables]
>> [   91.139634] LR [d16a569c] .alloc_counters.isra.10+0x40/0x140 
>> [ip_tables]
>> [   91.139638] Call Trace:
>> [   91.139645] [c001fa5778f0] [d16a569c] 
>> .alloc_counters.isra.10+0x40/0x140 [ip_tables] (unreliable)
>> [   91.139655] [c001fa5779b0] [d16a5b54] 
>> .do_ipt_get_ctl+0x110/0x2ec [ip_tables]
>> [   91.139666] [c001fa577aa0] [c06233e0] .nf_getsockopt+0x68/0x88
>> [   91.139674] [c001fa577b40] [c0631608] 
>> .ip_getsockopt+0xbc/0x128
>> [   91.139682] [c001fa577bf0] [c065adf4] 
>> .raw_getsockopt+0x18/0x5c
>> [   91.139690] [c001fa577c60] [c05b5f60] 
>> .sock_common_getsockopt+0x2c/0x40
>> [   91.139697] [c001fa577cd0] [c05b3394] 
>> .__sys_getsockopt+0xa4/0xd0
>> [   91.139704] [c001fa577d80] [c05b5ab0] 
>> .__se_sys_socketcall+0x238/0x2b4
>> [   91.139712] [c001fa577e30] [c000a31c] system_call+0x5c/0x70
>> [   91.139716] Instruction dump:
>> [   91.139721] 39290040 7d3d4a14 7fbe4840 409cff98 8138 2b890001 
>> 419d000c 393e0060
>> [   91.139736] 4810 7d57c82a e93e0060 7d295214 <815a> 794807e1 
>> 41e20010 7c210b78
>> [   91.139752] ---[ end trace f5d1d5431651845d ]---
>
> This is due to 7290d58095 ("module: use relative references for
> __ksymtab entries"). This part of kernel/module.c -
>
>/* Divert to percpu allocation if a percpu var. */
>if (sym[i].st_shndx == info->index.pcpu)
>secbase = (unsigned long)mod_percpu(mod);
>else
>secbase = info->sechdrs[sym[i].st_shndx].sh_addr;
>sym[i].st_value += secbase;
>
> Causes the distance to the target to exceed 32-bits on powerpc, so
> it doesn't fit in a rel32 reloc. Not sure how other archs cope.
>

Apologies for the breakage. It does indeed appear to affect all
architectures, and I'm a bit puzzled why you are the first one to spot
it.

I will try to find a clean way to special case the per-CPU variable
__ksymtab references in the generic module code, and if that is too
cumbersome, we can switch to 64-bit relative references (or rather,
native word size relative references) instead. Or revert the whole
thing ...


Re: Oops running iptables -F OUTPUT

2018-08-27 Thread Nicholas Piggin
On Mon, 27 Aug 2018 19:11:01 +0200
Andreas Schwab  wrote:

> I'm getting this Oops when running iptables -F OUTPUT:
> 
> [   91.139409] Unable to handle kernel paging request for data at address 
> 0xd001fff12f34
> [   91.139414] Faulting instruction address: 0xd16a5718
> [   91.139419] Oops: Kernel access of bad area, sig: 11 [#1]
> [   91.139426] BE SMP NR_CPUS=2 PowerMac
> [   91.139434] Modules linked in: iptable_filter ip_tables x_tables bpfilter 
> nfsd auth_rpcgss lockd grace nfs_acl sunrpc tun af_packet snd_aoa_codec_tas 
> snd_aoa_fabric_layout snd_aoa snd_aoa_i2sbus snd_aoa_soundbus snd_pcm_oss 
> snd_pcm snd_seq snd_timer snd_seq_device snd_mixer_oss snd sungem sr_mod 
> firewire_ohci cdrom sungem_phy soundcore firewire_core pata_macio crc_itu_t 
> sg hid_generic usbhid linear md_mod ohci_pci ohci_hcd ehci_pci ehci_hcd 
> usbcore usb_common dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log 
> dm_mod sata_svw
> [   91.139522] CPU: 1 PID: 3620 Comm: iptables Not tainted 4.19.0-rc1 #1
> [   91.139526] NIP:  d16a5718 LR: d16a569c CTR: 
> c06f560c
> [   91.139531] REGS: c001fa577670 TRAP: 0300   Not tainted  (4.19.0-rc1)
> [   91.139534] MSR:  9200b032   CR: 
> 84002484  XER: 2000
> [   91.139553] DAR: d001fff12f34 DSISR: 4000 IRQMASK: 0 
> GPR00: d16a569c c001fa5778f0 d16b0400  
> GPR04: 0002  8001fa46418e c001fa0d05c8 
> GPR08: d16b0400 d00037f13000 0001ff3e7000 d16a6fb8 
> GPR12: c06f560c c780   
> GPR16: 11635010 3fffa1b7aa68   
> GPR20: 0003 10013918 116350c0 c0b88990 
> GPR24: c0b88ba4  d001fff12f34  
> GPR28: d16b8000 c001fa20f400 c001fa20f440  
> [   91.139627] NIP [d16a5718] .alloc_counters.isra.10+0xbc/0x140 
> [ip_tables]
> [   91.139634] LR [d16a569c] .alloc_counters.isra.10+0x40/0x140 
> [ip_tables]
> [   91.139638] Call Trace:
> [   91.139645] [c001fa5778f0] [d16a569c] 
> .alloc_counters.isra.10+0x40/0x140 [ip_tables] (unreliable)
> [   91.139655] [c001fa5779b0] [d16a5b54] 
> .do_ipt_get_ctl+0x110/0x2ec [ip_tables]
> [   91.139666] [c001fa577aa0] [c06233e0] .nf_getsockopt+0x68/0x88
> [   91.139674] [c001fa577b40] [c0631608] .ip_getsockopt+0xbc/0x128
> [   91.139682] [c001fa577bf0] [c065adf4] .raw_getsockopt+0x18/0x5c
> [   91.139690] [c001fa577c60] [c05b5f60] 
> .sock_common_getsockopt+0x2c/0x40
> [   91.139697] [c001fa577cd0] [c05b3394] 
> .__sys_getsockopt+0xa4/0xd0
> [   91.139704] [c001fa577d80] [c05b5ab0] 
> .__se_sys_socketcall+0x238/0x2b4
> [   91.139712] [c001fa577e30] [c000a31c] system_call+0x5c/0x70
> [   91.139716] Instruction dump:
> [   91.139721] 39290040 7d3d4a14 7fbe4840 409cff98 8138 2b890001 419d000c 
> 393e0060 
> [   91.139736] 4810 7d57c82a e93e0060 7d295214 <815a> 794807e1 
> 41e20010 7c210b78 
> [   91.139752] ---[ end trace f5d1d5431651845d ]---

This is due to 7290d58095 ("module: use relative references for
__ksymtab entries"). This part of kernel/module.c -

   /* Divert to percpu allocation if a percpu var. */
   if (sym[i].st_shndx == info->index.pcpu)
   secbase = (unsigned long)mod_percpu(mod);
   else
   secbase = info->sechdrs[sym[i].st_shndx].sh_addr;
   sym[i].st_value += secbase;

Causes the distance to the target to exceed 32-bits on powerpc, so
it doesn't fit in a rel32 reloc. Not sure how other archs cope.

Thanks,
Nick


Oops running iptables -F OUTPUT

2018-08-27 Thread Andreas Schwab
I'm getting this Oops when running iptables -F OUTPUT:

[   91.139409] Unable to handle kernel paging request for data at address 
0xd001fff12f34
[   91.139414] Faulting instruction address: 0xd16a5718
[   91.139419] Oops: Kernel access of bad area, sig: 11 [#1]
[   91.139426] BE SMP NR_CPUS=2 PowerMac
[   91.139434] Modules linked in: iptable_filter ip_tables x_tables bpfilter 
nfsd auth_rpcgss lockd grace nfs_acl sunrpc tun af_packet snd_aoa_codec_tas 
snd_aoa_fabric_layout snd_aoa snd_aoa_i2sbus snd_aoa_soundbus snd_pcm_oss 
snd_pcm snd_seq snd_timer snd_seq_device snd_mixer_oss snd sungem sr_mod 
firewire_ohci cdrom sungem_phy soundcore firewire_core pata_macio crc_itu_t sg 
hid_generic usbhid linear md_mod ohci_pci ohci_hcd ehci_pci ehci_hcd usbcore 
usb_common dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_mod sata_svw
[   91.139522] CPU: 1 PID: 3620 Comm: iptables Not tainted 4.19.0-rc1 #1
[   91.139526] NIP:  d16a5718 LR: d16a569c CTR: c06f560c
[   91.139531] REGS: c001fa577670 TRAP: 0300   Not tainted  (4.19.0-rc1)
[   91.139534] MSR:  9200b032   CR: 
84002484  XER: 2000
[   91.139553] DAR: d001fff12f34 DSISR: 4000 IRQMASK: 0 
GPR00: d16a569c c001fa5778f0 d16b0400  
GPR04: 0002  8001fa46418e c001fa0d05c8 
GPR08: d16b0400 d00037f13000 0001ff3e7000 d16a6fb8 
GPR12: c06f560c c780   
GPR16: 11635010 3fffa1b7aa68   
GPR20: 0003 10013918 116350c0 c0b88990 
GPR24: c0b88ba4  d001fff12f34  
GPR28: d16b8000 c001fa20f400 c001fa20f440  
[   91.139627] NIP [d16a5718] .alloc_counters.isra.10+0xbc/0x140 
[ip_tables]
[   91.139634] LR [d16a569c] .alloc_counters.isra.10+0x40/0x140 
[ip_tables]
[   91.139638] Call Trace:
[   91.139645] [c001fa5778f0] [d16a569c] 
.alloc_counters.isra.10+0x40/0x140 [ip_tables] (unreliable)
[   91.139655] [c001fa5779b0] [d16a5b54] 
.do_ipt_get_ctl+0x110/0x2ec [ip_tables]
[   91.139666] [c001fa577aa0] [c06233e0] .nf_getsockopt+0x68/0x88
[   91.139674] [c001fa577b40] [c0631608] .ip_getsockopt+0xbc/0x128
[   91.139682] [c001fa577bf0] [c065adf4] .raw_getsockopt+0x18/0x5c
[   91.139690] [c001fa577c60] [c05b5f60] 
.sock_common_getsockopt+0x2c/0x40
[   91.139697] [c001fa577cd0] [c05b3394] .__sys_getsockopt+0xa4/0xd0
[   91.139704] [c001fa577d80] [c05b5ab0] 
.__se_sys_socketcall+0x238/0x2b4
[   91.139712] [c001fa577e30] [c000a31c] system_call+0x5c/0x70
[   91.139716] Instruction dump:
[   91.139721] 39290040 7d3d4a14 7fbe4840 409cff98 8138 2b890001 419d000c 
393e0060 
[   91.139736] 4810 7d57c82a e93e0060 7d295214 <815a> 794807e1 41e20010 
7c210b78 
[   91.139752] ---[ end trace f5d1d5431651845d ]---


Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."