Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-12 Thread Borislav Petkov
On Thu, Mar 09, 2017 at 03:26:02PM -0800, Linus Torvalds wrote: > Maybe it's the lguest games with PGE that need to be removed? Btw, tglx suggested something else the other day: warn when we're changing boot_cpu_data x86_capability bits *after* alternatives have run. The reasoning behind it being

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-09 Thread Daniel Borkmann
On 03/10/2017 12:44 AM, Borislav Petkov wrote: On Thu, Mar 09, 2017 at 03:26:02PM -0800, Linus Torvalds wrote: So should all of commit ("c109bf95992b x86/cpufeature: Remove cpu_has_pge") just be reverted (and then marked for stable)? Or do we have some alternate plan? I think we want to do

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-09 Thread Borislav Petkov
On Thu, Mar 09, 2017 at 03:26:02PM -0800, Linus Torvalds wrote: > So should all of commit ("c109bf95992b x86/cpufeature: Remove > cpu_has_pge") just be reverted (and then marked for stable)? > > Or do we have some alternate plan? I think we want to do this: diff --git

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-09 Thread Linus Torvalds
On Thu, Mar 9, 2017 at 2:48 PM, Borislav Petkov wrote: > > I guess we could return to doing boot_cpu_has() in __flush_tlb_all() > then. I mean, the timing-sensitivity argument is meh - killing global > TLB entries a bit faster doesn't bring me a whole lot when I have to go > and

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-09 Thread Borislav Petkov
On Thu, Mar 09, 2017 at 11:11:17PM +0100, Daniel Borkmann wrote: > Yeah, I just tried that out and it had no effect unfortunately, the > static_cpu_has() was still 1. Right, just as I thought. I guess we could return to doing boot_cpu_has() in __flush_tlb_all() then. I mean, the

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-09 Thread Daniel Borkmann
On 03/09/2017 11:07 PM, Borislav Petkov wrote: On Thu, Mar 09, 2017 at 10:55:47PM +0100, Borislav Petkov wrote: Can you make that: setup_clear_cpu_cap(X86_FEATURE_PGE); and see if it fixes your issue? Hmm, in reading the thread a bit more, that might not work. If I see it correctly,

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-09 Thread Borislav Petkov
On Thu, Mar 09, 2017 at 10:32:12PM +0100, Daniel Borkmann wrote: > get_online_cpus(); > if (boot_cpu_has(X86_FEATURE_PGE)) { /* We have a broader idea of > "global". */ > /* Remember that this was originally set (for cleanup). */ > cpu_had_pge = 1; >

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-09 Thread Borislav Petkov
On Thu, Mar 09, 2017 at 10:55:47PM +0100, Borislav Petkov wrote: > Can you make that: > > setup_clear_cpu_cap(X86_FEATURE_PGE); > > and see if it fixes your issue? Hmm, in reading the thread a bit more, that might not work. If I see it correctly, lguest does clear_cpu_cap(_cpu_data,

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-09 Thread Daniel Borkmann
[ + Borislav ] On 03/09/2017 07:31 PM, Daniel Borkmann wrote: On 03/09/2017 07:15 PM, Linus Torvalds wrote: On Thu, Mar 9, 2017 at 10:10 AM, Linus Torvalds wrote: Very odd. We should always have PGE (0x0080) set in cr4 (if the CPU supports it). Daniel, do

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-09 Thread Daniel Borkmann
On 03/09/2017 07:15 PM, Linus Torvalds wrote: On Thu, Mar 9, 2017 at 10:10 AM, Linus Torvalds wrote: Very odd. We should always have PGE (0x0080) set in cr4 (if the CPU supports it). Daniel, do you see the code in probe_page_size_mask() triggering?

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-09 Thread Linus Torvalds
On Thu, Mar 9, 2017 at 10:10 AM, Linus Torvalds wrote: > > Very odd. We should always have PGE (0x0080) set in cr4 (if the CPU > supports it). Daniel, do you see the code in probe_page_size_mask() triggering? /* Enable PGE if available */ if

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-09 Thread Linus Torvalds
On Thu, Mar 9, 2017 at 9:51 AM, Daniel Borkmann wrote: > > What I see is that original cr4 is 0x610. The cpu_tlbstate.cr4 is > consistent to native_read_cr4() and since cr4 is != 0, it tells me > based on the comment in native_read_cr4() that cr4 seems to be > supported.

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-09 Thread David Miller
From: Daniel Borkmann Date: Thu, 09 Mar 2017 18:51:03 +0100 > I added some debugging around __native_flush_tlb_global_irq_disabled() > and if I understand it correctly, the idea of cr4 is that we need to > toggle X86_CR4_PGE in order to trigger a TLB flush. > > What I see

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-09 Thread Daniel Borkmann
On 03/09/2017 03:49 PM, Thomas Gleixner wrote: On Thu, 9 Mar 2017, Daniel Borkmann wrote: On 03/09/2017 02:10 PM, Thomas Gleixner wrote: On Thu, 9 Mar 2017, Daniel Borkmann wrote: With regard to CPA_FLUSHTLB that Linus mentioned, when I investigated code paths in change_page_attr_set_clr(), I

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-09 Thread Linus Torvalds
On Thu, Mar 9, 2017 at 6:53 AM, Daniel Borkmann wrote: > > Fwiw, I tried switching from using cr4 > (__native_flush_tlb_global_irq_disabled()) > to slower cr3 (__native_flush_tlb()) in "-cpu kvm64" mode, and it looks like > it also lets all test cases pass (rodata_test,

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-09 Thread Daniel Borkmann
On 03/09/2017 02:25 PM, Daniel Borkmann wrote: On 03/09/2017 02:10 PM, Thomas Gleixner wrote: On Thu, 9 Mar 2017, Daniel Borkmann wrote: With regard to CPA_FLUSHTLB that Linus mentioned, when I investigated code paths in change_page_attr_set_clr(), I did see that CPA_FLUSHTLB was set each time

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-09 Thread Thomas Gleixner
On Thu, 9 Mar 2017, Daniel Borkmann wrote: > On 03/09/2017 02:10 PM, Thomas Gleixner wrote: > > On Thu, 9 Mar 2017, Daniel Borkmann wrote: > > > With regard to CPA_FLUSHTLB that Linus mentioned, when I investigated > > > code paths in change_page_attr_set_clr(), I did see that CPA_FLUSHTLB > > >

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-09 Thread Daniel Borkmann
On 03/09/2017 02:10 PM, Thomas Gleixner wrote: On Thu, 9 Mar 2017, Daniel Borkmann wrote: With regard to CPA_FLUSHTLB that Linus mentioned, when I investigated code paths in change_page_attr_set_clr(), I did see that CPA_FLUSHTLB was set each time we switched attrs and a cpa_flush_range() was

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-09 Thread Thomas Gleixner
On Wed, 8 Mar 2017, Linus Torvalds wrote: > Adding x86 people too, since this seems to be something off about > ARCH_HAS_SET_MEMORY for x86-32. > > The code seems to be shared between x86-32 and 64, I'm not seeing why > set_memory_r[ow]() should fail on one but not the other. Indeed. >

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-09 Thread Thomas Gleixner
On Thu, 9 Mar 2017, Daniel Borkmann wrote: > With regard to CPA_FLUSHTLB that Linus mentioned, when I investigated > code paths in change_page_attr_set_clr(), I did see that CPA_FLUSHTLB > was set each time we switched attrs and a cpa_flush_range() was > performed (with the correct number of pages

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-09 Thread Daniel Borkmann
On 03/09/2017 06:36 AM, Kees Cook wrote: On Wed, Mar 8, 2017 at 3:55 PM, Laura Abbott wrote: On 03/08/2017 02:36 PM, Kees Cook wrote: On Wed, Mar 8, 2017 at 2:27 PM, Daniel Borkmann wrote: [ 28.474232] rodata_test: test data was not read only

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-08 Thread Kees Cook
On Wed, Mar 8, 2017 at 3:55 PM, Laura Abbott wrote: > On 03/08/2017 02:36 PM, Kees Cook wrote: >> On Wed, Mar 8, 2017 at 2:27 PM, Daniel Borkmann wrote: >>> [ 28.474232] rodata_test: test data was not read only >>> [...] >> >> In my tests so far, I've

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-08 Thread Fengguang Wu
On Wed, Mar 08, 2017 at 02:43:44PM -0800, Linus Torvalds wrote: On Wed, Mar 8, 2017 at 2:27 PM, Daniel Borkmann wrote: The issue seems to be accessing buff first (can be read or write access) and then doing set_memory_ro() doesn't make it read-only immediately, meaning

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-08 Thread Laura Abbott
On 03/08/2017 02:36 PM, Kees Cook wrote: > On Wed, Mar 8, 2017 at 2:27 PM, Daniel Borkmann wrote: >> [ 28.474232] rodata_test: test data was not read only >> [...] > > In my tests so far, I've never been able to get rodata_test to fail > (Qemu 2.5.0, Ubuntu). I'll retry

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-08 Thread Daniel Borkmann
On 03/08/2017 11:36 PM, Kees Cook wrote: On Wed, Mar 8, 2017 at 2:27 PM, Daniel Borkmann wrote: [ 28.474232] rodata_test: test data was not read only [...] In my tests so far, I've never been able to get rodata_test to fail (Qemu 2.5.0, Ubuntu). I'll retry with your

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-08 Thread Linus Torvalds
On Wed, Mar 8, 2017 at 2:27 PM, Daniel Borkmann wrote: > > The issue seems to be accessing buff first (can be read or write access) > and then doing set_memory_ro() doesn't make it read-only immediately, > meaning the subsequent call into probe_kernel_write() will succeed

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-08 Thread Kees Cook
On Wed, Mar 8, 2017 at 2:27 PM, Daniel Borkmann wrote: > [ 28.474232] rodata_test: test data was not read only > [...] In my tests so far, I've never been able to get rodata_test to fail (Qemu 2.5.0, Ubuntu). I'll retry with your .config and see if I can recheck under

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-08 Thread Daniel Borkmann
[ + Kees, Laura, and Dave ] On 03/08/2017 08:25 PM, Linus Torvalds wrote: Adding x86 people too, since this seems to be something off about ARCH_HAS_SET_MEMORY for x86-32. The code seems to be shared between x86-32 and 64, I'm not seeing why set_memory_r[ow]() should fail on one but not the

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-08 Thread Linus Torvalds
Adding x86 people too, since this seems to be something off about ARCH_HAS_SET_MEMORY for x86-32. The code seems to be shared between x86-32 and 64, I'm not seeing why set_memory_r[ow]() should fail on one but not the other. Considering that it seems to be flaky even on 32-bit, maybe it's

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-02 Thread Daniel Borkmann
On 03/02/2017 09:23 PM, Fengguang Wu wrote: [...] I confirm that the below patch provided by Daniel fixes the above issues on mainline kernel, too. Where should this patch be sent to? If nobody objects, I could send it to -net tree via Dave due to being BPF related, but I don't mind sending it

Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

2017-03-02 Thread Fengguang Wu
On Wed, Mar 01, 2017 at 08:54:26PM +0800, Fengguang Wu wrote: Hi all, Is it BPF triggering BUGs all over the places? It looks so, and here is a fix. 1e74a2eb1f Merge tag 'gcc-plugins-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux 005c3490e9 Revert "ath10k: Search