Re: [ANNOUNCE] v5.9-rc3-rt3

2020-09-09 Thread Sebastian Andrzej Siewior
On 2020-09-09 10:56:41 [+0200], Mike Galbraith wrote:
> On Wed, 2020-09-09 at 10:20 +0200, Sebastian Andrzej Siewior wrote:
> >
> > Do you see the lockdep splat without nouveau?
> 
> Yeah.  Lappy uses i915, but lockdep also shuts itself off.

You sent the config, I will try to throw it later on kvm and actual
hardware and see what happens.

> BTW, methinks RT had nothing to do with the nouveau burp.

that is good to hear :)

>   -Mike

Sebastian


Re: [ANNOUNCE] v5.9-rc3-rt3

2020-09-09 Thread Mike Galbraith
On Wed, 2020-09-09 at 10:20 +0200, Sebastian Andrzej Siewior wrote:
>
> Do you see the lockdep splat without nouveau?

Yeah.  Lappy uses i915, but lockdep also shuts itself off.

BTW, methinks RT had nothing to do with the nouveau burp.

-Mike



Re: [ANNOUNCE] v5.9-rc3-rt3

2020-09-09 Thread Sebastian Andrzej Siewior
On 2020-09-09 07:45:22 [+0200], Mike Galbraith wrote:
> On Wed, 2020-09-09 at 05:12 +0200, Mike Galbraith wrote:
> > On Wed, 2020-09-02 at 17:55 +0200, Sebastian Andrzej Siewior wrote:
> > >
> > > Known issues
> > >  - It has been pointed out that due to changes to the printk code the
> > >internal buffer representation changed. This is only an issue if 
> > > tools
> > >like `crash' are used to extract the printk buffer from a kernel 
> > > memory
> > >image.
> >
> > Ouch.  While installing -rt5 on lappy via nfs, -rt5 server box exploded
> > leaving nada in logs.  I have a nifty crash dump of the event, but...
> 
> After convincing crash (with club) that it didn't _really_ need a
> log_buf, nfs had nothing to do with the crash, it was nouveau.

okay. Line 280 is hard to understand. My guess is that we got a pointer
and then the boom occurred but I can't tell why/how. A few lines later
there is args->x = y…
Do you see the lockdep splat without nouveau?

> crash> bt -l
> PID: 2146   TASK: 994c7fad  CPU: 0   COMMAND: "X"
>  #0 [bfffc11a76c8] machine_kexec at b7064879
> /backup/usr/local/src/kernel/linux-master-rt/./include/linux/ftrace.h: 792
>  #1 [bfffc11a7710] __crash_kexec at b7173622
> /backup/usr/local/src/kernel/linux-master-rt/kernel/kexec_core.c: 963
>  #2 [bfffc11a77d0] crash_kexec at b7174920
> 
> /backup/usr/local/src/kernel/linux-master-rt/./arch/x86/include/asm/atomic.h: 
> 41
>  #3 [bfffc11a77e0] oops_end at b702716f
> /backup/usr/local/src/kernel/linux-master-rt/arch/x86/kernel/dumpstack.c: 
> 342
>  #4 [bfffc11a7800] exc_general_protection at b79a2fc6
> /backup/usr/local/src/kernel/linux-master-rt/arch/x86/kernel/traps.c: 82
>  #5 [bfffc11a7890] asm_exc_general_protection at b7a00a1e
> 
> /backup/usr/local/src/kernel/linux-master-rt/./arch/x86/include/asm/idtentry.h:
>  532
>  #6 [bfffc11a78a0] nvif_object_ctor at c07ee6a7 [nouveau]
> 
> /backup/usr/local/src/kernel/linux-master-rt/drivers/gpu/drm/nouveau/nvif/object.c:
>  280
>  #7 [bfffc11a7918] __kmalloc at b72eea12
> /backup/usr/local/src/kernel/linux-master-rt/mm/slub.c: 261
>  #8 [bfffc11a7980] nvif_object_ctor at c07ee6a7 [nouveau]
> 
> /backup/usr/local/src/kernel/linux-master-rt/drivers/gpu/drm/nouveau/nvif/object.c:
>  280

Sebastian


Re: [ANNOUNCE] v5.9-rc3-rt3

2020-09-08 Thread Mike Galbraith
On Wed, 2020-09-09 at 05:12 +0200, Mike Galbraith wrote:
> On Wed, 2020-09-02 at 17:55 +0200, Sebastian Andrzej Siewior wrote:
> >
> > Known issues
> >  - It has been pointed out that due to changes to the printk code the
> >internal buffer representation changed. This is only an issue if 
> > tools
> >like `crash' are used to extract the printk buffer from a kernel 
> > memory
> >image.
>
> Ouch.  While installing -rt5 on lappy via nfs, -rt5 server box exploded
> leaving nada in logs.  I have a nifty crash dump of the event, but...

After convincing crash (with club) that it didn't _really_ need a
log_buf, nfs had nothing to do with the crash, it was nouveau.

  KERNEL: vmlinux-5.9.0.gf4d51df-rt5-rt.gz
DUMPFILE: vmcore
CPUS: 8
DATE: Wed Sep  9 04:41:24 2020
  UPTIME: 00:08:10
LOAD AVERAGE: 3.17, 1.86, 0.99
   TASKS: 715
NODENAME: homer
 RELEASE: 5.9.0.gf4d51df-rt5-rt
 VERSION: #1 SMP PREEMPT_RT Wed Sep 9 03:22:01 CEST 2020
 MACHINE: x86_64  (3591 Mhz)
  MEMORY: 16 GB
   PANIC: ""
 PID: 2146
 COMMAND: "X"
TASK: 994c7fad  [THREAD_INFO: 994c7fad]
 CPU: 0
   STATE: TASK_RUNNING (PANIC)

crash> bt -l
PID: 2146   TASK: 994c7fad  CPU: 0   COMMAND: "X"
 #0 [bfffc11a76c8] machine_kexec at b7064879
/backup/usr/local/src/kernel/linux-master-rt/./include/linux/ftrace.h: 792
 #1 [bfffc11a7710] __crash_kexec at b7173622
/backup/usr/local/src/kernel/linux-master-rt/kernel/kexec_core.c: 963
 #2 [bfffc11a77d0] crash_kexec at b7174920

/backup/usr/local/src/kernel/linux-master-rt/./arch/x86/include/asm/atomic.h: 41
 #3 [bfffc11a77e0] oops_end at b702716f
/backup/usr/local/src/kernel/linux-master-rt/arch/x86/kernel/dumpstack.c: 
342
 #4 [bfffc11a7800] exc_general_protection at b79a2fc6
/backup/usr/local/src/kernel/linux-master-rt/arch/x86/kernel/traps.c: 82
 #5 [bfffc11a7890] asm_exc_general_protection at b7a00a1e

/backup/usr/local/src/kernel/linux-master-rt/./arch/x86/include/asm/idtentry.h: 
532
 #6 [bfffc11a78a0] nvif_object_ctor at c07ee6a7 [nouveau]

/backup/usr/local/src/kernel/linux-master-rt/drivers/gpu/drm/nouveau/nvif/object.c:
 280
 #7 [bfffc11a7918] __kmalloc at b72eea12
/backup/usr/local/src/kernel/linux-master-rt/mm/slub.c: 261
 #8 [bfffc11a7980] nvif_object_ctor at c07ee6a7 [nouveau]

/backup/usr/local/src/kernel/linux-master-rt/drivers/gpu/drm/nouveau/nvif/object.c:
 280
 #9 [bfffc11a79d0] nvif_mem_ctor_type at c07eef48 [nouveau]

/backup/usr/local/src/kernel/linux-master-rt/drivers/gpu/drm/nouveau/nvif/mem.c:
 74
#10 [bfffc11a7aa8] nouveau_mem_vram at c08b5291 [nouveau]

/backup/usr/local/src/kernel/linux-master-rt/drivers/gpu/drm/nouveau/nouveau_mem.c:
 155
#11 [bfffc11a7b10] nouveau_vram_manager_new at c08b594d [nouveau]

/backup/usr/local/src/kernel/linux-master-rt/drivers/gpu/drm/nouveau/nouveau_ttm.c:
 76
#12 [bfffc11a7b30] ttm_bo_mem_space at c05af2ac [ttm]
/backup/usr/local/src/kernel/linux-master-rt/drivers/gpu/drm/ttm/ttm_bo.c: 
1065
#13 [bfffc11a7b88] ttm_bo_validate at c05afaca [ttm]
/backup/usr/local/src/kernel/linux-master-rt/drivers/gpu/drm/ttm/ttm_bo.c: 
1137
#14 [bfffc11a7c18] ttm_bo_init_reserved at c05afe70 [ttm]
/backup/usr/local/src/kernel/linux-master-rt/drivers/gpu/drm/ttm/ttm_bo.c: 
1330
#15 [bfffc11a7c60] ttm_bo_init at c05afff7 [ttm]
/backup/usr/local/src/kernel/linux-master-rt/drivers/gpu/drm/ttm/ttm_bo.c: 
1364
#16 [bfffc11a7cc8] nouveau_bo_init at c08b0f7b [nouveau]

/backup/usr/local/src/kernel/linux-master-rt/drivers/gpu/drm/nouveau/nouveau_bo.c:
 317
#17 [bfffc11a7d38] nouveau_gem_new at c08b2f7b [nouveau]

/backup/usr/local/src/kernel/linux-master-rt/drivers/gpu/drm/nouveau/nouveau_gem.c:
 206
#18 [bfffc11a7d70] nouveau_gem_ioctl_new at c08b3001 [nouveau]

/backup/usr/local/src/kernel/linux-master-rt/drivers/gpu/drm/nouveau/nouveau_gem.c:
 272
#19 [bfffc11a7da0] drm_ioctl_kernel at c066f564 [drm]
/backup/usr/local/src/kernel/linux-master-rt/drivers/gpu/drm/drm_ioctl.c: 
793
#20 [bfffc11a7de0] drm_ioctl at c066f88e [drm]
/backup/usr/local/src/kernel/linux-master-rt/./include/linux/uaccess.h: 168
#21 [bfffc11a7ed0] nouveau_drm_ioctl at c08abf56 [nouveau]

/backup/usr/local/src/kernel/linux-master-rt/drivers/gpu/drm/nouveau/nouveau_drm.c:
 1163
#22 [bfffc11a7f08] __x64_sys_ioctl at b733255e
/backup/usr/local/src/kernel/linux-master-rt/fs/ioctl.c: 49
#23 [bfffc11a7f40] do_syscall_64 at b79a25c3
/backup/usr/local/src/kernel/linux-master-rt/arch/x86/entry/common.c: 46
#24 [bfffc11a7f50] entry_SYSCALL_64_after_hwframe at b7a0008c

Re: [ANNOUNCE] v5.9-rc3-rt3

2020-09-08 Thread Mike Galbraith
On Wed, 2020-09-09 at 05:12 +0200, Mike Galbraith wrote:
> On Wed, 2020-09-02 at 17:55 +0200, Sebastian Andrzej Siewior wrote:
> >
> > Known issues
> >  - It has been pointed out that due to changes to the printk code the
> >internal buffer representation changed. This is only an issue if 
> > tools
> >like `crash' are used to extract the printk buffer from a kernel 
> > memory
> >image.
>
> Ouch.  While installing -rt5 on lappy via nfs, -rt5 server box exploded
> leaving nada in logs.  I have a nifty crash dump of the event, but...

I backed out 1ce98b8a0a1..463463c6fa3f so crash will work again, but
haven't as yet been able to convince box to explode.  Hohum, I'll give
it some time.

Lockdep did repeat dirtying of its diaper though, on both lappy and
desktop boxen at roughly the same uptime.

[  922.978106] BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!
[  922.978112] turning off the locking correctness validator.
[  922.978116] CPU: 2 PID: 5837 Comm: kworker/u16:0 Kdump: loaded Tainted: G S  
E 5.9.0.gf4d51df-rt5-rt #3
[  922.978120] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[  922.978127] Workqueue: writeback wb_workfn (flush-8:48)
[  922.978131] Call Trace:
[  922.978138]  dump_stack+0x77/0x9b
[  922.978143]  validate_chain+0xf60/0x1230
[  922.978147]  __lock_acquire+0x880/0xbf0
[  922.978151]  lock_acquire+0x92/0x3f0
[  922.978155]  ? rt_spin_lock_slowlock_locked+0x5d/0x2c0
[  922.978160]  _raw_spin_lock+0x2f/0x40
[  922.978163]  ? rt_spin_lock_slowlock_locked+0x5d/0x2c0
[  922.978169]  rt_spin_lock_slowlock_locked+0x5d/0x2c0
[  922.978173]  __read_rt_lock+0x97/0xc0
[  922.978194]  ext4_es_lookup_extent+0x4f/0x410 [ext4]
[  922.978205]  ext4_map_blocks+0x50/0x530 [ext4]
[  922.978209]  ? kmem_cache_alloc+0x636/0x8b0
[  922.978220]  ext4_writepages+0xa2c/0x1330 [ext4]
[  922.978228]  ? do_writepages+0x3c/0xe0
[  922.978231]  do_writepages+0x3c/0xe0
[  922.978236]  ? __writeback_single_inode+0x62/0x890
[  922.978240]  __writeback_single_inode+0x62/0x890
[  922.978244]  writeback_sb_inodes+0x217/0x580
[  922.978250]  __writeback_inodes_wb+0x5d/0xd0
[  922.978254]  wb_writeback+0x28c/0x620
[  922.978259]  ? wb_workfn+0x2bc/0x7f0
[  922.978262]  wb_workfn+0x2bc/0x7f0
[  922.978266]  ? lock_acquire+0x92/0x3f0
[  922.978270]  ? process_one_work+0x1fa/0x730
[  922.978274]  ? process_one_work+0x284/0x730
[  922.978278]  ? process_one_work+0x251/0x730
[  922.978281]  process_one_work+0x284/0x730
[  922.978285]  ? _raw_spin_lock_irq+0x16/0x50
[  922.978289]  ? process_one_work+0x730/0x730
[  922.978293]  worker_thread+0x39/0x3f0
[  922.978297]  ? process_one_work+0x730/0x730
[  922.978300]  kthread+0x171/0x190
[  922.978304]  ? kthread_park+0x90/0x90
[  922.978308]  ret_from_fork+0x1f/0x30



Re: [ANNOUNCE] v5.9-rc3-rt3

2020-09-08 Thread Mike Galbraith
On Wed, 2020-09-02 at 17:55 +0200, Sebastian Andrzej Siewior wrote:
>
> Known issues
>  - It has been pointed out that due to changes to the printk code the
>internal buffer representation changed. This is only an issue if tools
>like `crash' are used to extract the printk buffer from a kernel memory
>image.

Ouch.  While installing -rt5 on lappy via nfs, -rt5 server box exploded
leaving nada in logs.  I have a nifty crash dump of the event, but...

-Mike



[ANNOUNCE] v5.9-rc3-rt3

2020-09-02 Thread Sebastian Andrzej Siewior
Dear RT folks!

I'm pleased to announce the v5.9-rc3-rt3 patch set. 

Changes since v5.9-rc3-rt2:

  - Correct a compile issue in the i915 driver. Reported by Carsten Emde
and Daniel Wagner.

  - Mark Marshall reported a crash on PowerPC. The reason for the crash
is a race in exec_mmap() vs a context switch and is not limited to
PowerPC. This race is present since v5.4.3-rt1 and is addressed in
two changes:

- commit 38cf307c1f201 ("mm: fix kthread_use_mm() vs TLB invalidate")
  which is part of v5.9-rc1.

- patch "mm: fix exec activate_mm vs TLB shootdown and lazy tlb switching 
race"
  by Nicholas Piggin which has been posted for review and is not yet
  merged upstream.

Known issues
 - It has been pointed out that due to changes to the printk code the
   internal buffer representation changed. This is only an issue if tools
   like `crash' are used to extract the printk buffer from a kernel memory
   image.

The delta patch against v5.9-rc3-rt2 is appended below and can be found here:
 
 
https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.9/incr/patch-5.9-rc3-rt2-rt3.patch.xz

You can get this release via the git tree at:

git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git 
v5.9-rc3-rt3

The RT patch against v5.9-rc3 can be found here:


https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.9/older/patch-5.9-rc3-rt3.patch.xz

The split quilt queue is available at:


https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.9/older/patches-5.9-rc3-rt3.tar.xz

Sebastian

diff --git a/arch/Kconfig b/arch/Kconfig
index 222e553f3cf50..5c8e173dc7c2b 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -415,6 +415,13 @@ config MMU_GATHER_NO_GATHER
bool
depends on MMU_GATHER_TABLE_FREE
 
+config ARCH_WANT_IRQS_OFF_ACTIVATE_MM
+   bool
+   help
+ Temporary select until all architectures can be converted to have
+ irqs disabled over activate_mm. Architectures that do IPI based TLB
+ shootdowns should enable this.
+
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
bool
 
diff --git a/drivers/gpu/drm/i915/display/intel_display_types.h 
b/drivers/gpu/drm/i915/display/intel_display_types.h
index c5700f44422ec..e8f809161c75f 100644
--- a/drivers/gpu/drm/i915/display/intel_display_types.h
+++ b/drivers/gpu/drm/i915/display/intel_display_types.h
@@ -29,7 +29,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -1150,7 +1149,6 @@ struct intel_crtc {
 #ifdef CONFIG_DEBUG_FS
struct intel_pipe_crc pipe_crc;
 #endif
-   local_lock_t pipe_update_lock;
 };
 
 struct intel_plane {
diff --git a/drivers/gpu/drm/i915/display/intel_sprite.c 
b/drivers/gpu/drm/i915/display/intel_sprite.c
index 62b8248d2ee79..1b9d5e690a9f0 100644
--- a/drivers/gpu/drm/i915/display/intel_sprite.c
+++ b/drivers/gpu/drm/i915/display/intel_sprite.c
@@ -118,7 +118,8 @@ void intel_pipe_update_start(const struct intel_crtc_state 
*new_crtc_state)
"PSR idle timed out 0x%x, atomic update may fail\n",
psr_status);
 
-   local_lock_irq(>pipe_update_lock);
+   if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+   local_irq_disable();
 
crtc->debug.min_vbl = min;
crtc->debug.max_vbl = max;
@@ -143,11 +144,13 @@ void intel_pipe_update_start(const struct 
intel_crtc_state *new_crtc_state)
break;
}
 
-   local_unlock_irq(>pipe_update_lock);
+   if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+   local_irq_enable();
 
timeout = schedule_timeout(timeout);
 
-   local_lock_irq(>pipe_update_lock);
+   if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+   local_irq_disable();
}
 
finish_wait(wq, );
@@ -180,7 +183,8 @@ void intel_pipe_update_start(const struct intel_crtc_state 
*new_crtc_state)
return;
 
 irq_disable:
-   local_lock_irq(>pipe_update_lock);
+   if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+   local_irq_disable();
 }
 
 /**
@@ -218,7 +222,8 @@ void intel_pipe_update_end(struct intel_crtc_state 
*new_crtc_state)
new_crtc_state->uapi.event = NULL;
}
 
-   local_unlock_irq(>pipe_update_lock);
+   if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+   local_irq_enable();
 
if (intel_vgpu_active(dev_priv))
return;
diff --git a/fs/exec.c b/fs/exec.c
index a91003e28eaae..d4fb18baf1fb1 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1130,11 +1130,24 @@ static int exec_mmap(struct mm_struct *mm)
}
 
task_lock(tsk);
-   active_mm = tsk->active_mm;
membarrier_exec_mmap(mm);
-   tsk->mm = mm;
+
+   local_irq_disable();
+   active_mm = tsk->active_mm;
tsk->active_mm = mm;
+   tsk->mm = mm;
+   /*
+* This prevents preemption while active_mm is being loaded and
+* it and mm are