Re: Linux 4.17-rc1 - kernel paging errors running x86 selftests

2018-04-17 Thread Shuah Khan
On 04/16/2018 12:34 PM, Linus Torvalds wrote:
> On Mon, Apr 16, 2018 at 11:15 AM, Linus Torvalds
>  wrote:
>>
>> Ingo/Thomas: I will be just taking this directly, since it's so
>> trivial and obvious and I got cc'd on the discussion.
> 
> .. and I also verified that it actually fixes the problem Shuah
> reported. Not that there really was any question about it, but hey,
> after bisecting it I decided to just test the fix too.

Awesome. Thanks.

-- Shuah


Re: Linux 4.17-rc1 - kernel paging errors running x86 selftests

2018-04-16 Thread Ingo Molnar

* Linus Torvalds  wrote:

> On Mon, Apr 16, 2018 at 11:04 AM, Linus Torvalds
>  wrote:
> >
> > I was going through the bisection, with just a couple more rounds to
> > go, but I guess I don't even need it.
> 
> Ingo/Thomas: I will be just taking this directly, since it's so
> trivial and obvious and I got cc'd on the discussion.

A belated Ack - and thanks for applying the fix!

Ingo


Re: Linux 4.17-rc1 - kernel paging errors running x86 selftests

2018-04-16 Thread Linus Torvalds
On Mon, Apr 16, 2018 at 11:15 AM, Linus Torvalds
 wrote:
>
> Ingo/Thomas: I will be just taking this directly, since it's so
> trivial and obvious and I got cc'd on the discussion.

.. and I also verified that it actually fixes the problem Shuah
reported. Not that there really was any question about it, but hey,
after bisecting it I decided to just test the fix too.

I know, I know. What are users for? I must be slipping.

   Linus


Re: Linux 4.17-rc1 - kernel paging errors running x86 selftests

2018-04-16 Thread Linus Torvalds
On Mon, Apr 16, 2018 at 11:04 AM, Linus Torvalds
 wrote:
>
> I was going through the bisection, with just a couple more rounds to
> go, but I guess I don't even need it.

Ingo/Thomas: I will be just taking this directly, since it's so
trivial and obvious and I got cc'd on the discussion.

Linus


Re: Linux 4.17-rc1 - kernel paging errors running x86 selftests

2018-04-16 Thread Linus Torvalds
On Mon, Apr 16, 2018 at 10:58 AM, Dave Hansen
 wrote:
>
> Joerg just found and fixed something that would be poked by the x86
> selftests:
>
> https://lkml.org/lkml/2018/4/16/230

Yup.

And that silly bug explains the all-ones PTE.

I was going through the bisection, with just a couple more rounds to
go, but I guess I don't even need it.

  Linus


Re: Linux 4.17-rc1 - kernel paging errors running x86 selftests

2018-04-16 Thread Dave Hansen
On 04/16/2018 10:56 AM, Linus Torvalds wrote:
> On Mon, Apr 16, 2018 at 10:43 AM, Linus Torvalds
>  wrote:
>>
>> I don't see *why* it would be badly set up, and that test works fine
>> for me, though.
> 
> AHHAH!
> 
> I'm wrong. I can see it too. My desktop was running 18b7fd1c93e5 (my
> kernel from Saturday, I hadn't rebooted it since), but I had 4.17-rc1
> in kvmtool and on my laptop, and I see the problem in both cases.
> 
> So this came in recently, and I bet it's the global pages series from
> Dave Hansen, although there were a few other things that came in
> during the last day.

Joerg just found and fixed something that would be poked by the x86
selftests:

https://lkml.org/lkml/2018/4/16/230


Re: Linux 4.17-rc1 - kernel paging errors running x86 selftests

2018-04-16 Thread Linus Torvalds
On Mon, Apr 16, 2018 at 10:43 AM, Linus Torvalds
 wrote:
>
> I don't see *why* it would be badly set up, and that test works fine
> for me, though.

AHHAH!

I'm wrong. I can see it too. My desktop was running 18b7fd1c93e5 (my
kernel from Saturday, I hadn't rebooted it since), but I had 4.17-rc1
in kvmtool and on my laptop, and I see the problem in both cases.

So this came in recently, and I bet it's the global pages series from
Dave Hansen, although there were a few other things that came in
during the last day.

That should make it easy to bisect, there's only a handful of x86 changes.

Linus


Re: Linux 4.17-rc1 - kernel paging errors running x86 selftests

2018-04-16 Thread Linus Torvalds
On Mon, Apr 16, 2018 at 10:01 AM, Shuah Khan  wrote:
>
> [  884.496588] BUG: unable to handle kernel paging request at fe810030

This is the LDT remap area.

> [  884.496614] Oops: 0009 [#1] SMP KASAN PTI

This is RSVD + P, so it's a system read access that got a protection
fault due to reserved bits.

> [  884.496741] RIP: 0033:0x4031c2
> [  884.496745] RSP: 002b:7ffd805b56d8 EFLAGS: 00010246

This is not actually a kernel paging request, it's all user space, but
it's user space that does a system access.

That's normal - something loading a segment in user space, and thus
accessing the system LDT.

But:

> [  884.496601] PGD 372870067 P4D 372870067 PUD 346e84067 PMD 34005f067 PTE 
> 

WTF? What's that odd bogus PTE entry?

That's also why it gets a RSVD fault. That's just garbage. All-ones is
not a valid PTE.

The other levels look valid, although it strikes me that maybe we
shouldn't have the user bit set in the kernel page tables. I realize
that we clear it at the leaf node, but..

So the user page table is somehow badly set up.

I don't see *why* it would be badly set up, and that test works fine
for me, though.

It doesn't seem to have anything to do with KASAN, although

> [  884.650095] BUG: unable to handle kernel paging request at fe80
> [  884.650103] PGD 363699067 P4D 363699067 PUD 3371c6067 PMD 37cfbc067 PTE 
> 
> [  884.650112] Oops: 0009 [#2] SMP KASAN PTI
> [  884.650200] RIP: 0033:0x401471
> [  884.650203] RSP: 002b:7fc8e6775eb0 EFLAGS: 00010206

The other one is exactly the same thing.

 Linus