------- Comment From s...@us.ibm.com 2015-01-14 20:03 EDT-------
(In reply to comment #9)
> Suka,
>
> If patches [3] and [4] would make kernel 3.19, we will not need to backport
> them, since this bug is target for 15.04 (and 15.04 will ship probably with
> 3.19 kernel) On the other side, if we miss 3.19 window, I would ask you to
> backport them and attach the backport over 3.19 vivid git repository.
>
> Canonical, correct me if I am wrong, please.
>
> Thanks
> Breno

Breno,

Patches [3] and [4] have been merged into 3.19-rc4:

commit f34b6c7
Author: suka...@linux.vnet.ibm.com <suka...@linux.vnet.ibm.com>
Date:   Wed Dec 10 14:29:13 2014 -0800

powerpc/perf/hv-24x7: Use per-cpu page buffer

commit ec2aef5
Author: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
Date:   Wed Dec 10 01:43:34 2014 -0500

power/perf/hv-24x7: Use kmem_cache_free() instead of kfree

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1410519

Title:
  [PowerVM] Kernel BUG @ kernel/irq_work.c:157!  - 24x7 hw counters

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Steps to recreate the problem:

  1.  Install Ubuntu 15.04 as a PowerVM guest.
  2.  Install perf tool
  3.  Run following scripts to test 24/7 Power8 hardware counter event with 
perf. tool

  ===  Script 1
  #!/bin/bash

  count=0;

  offset=0x128
  PERF_ARGS="-r 10 -C 0"
  while [ $count -lt 100 ]; do

          EVENT="hv_24x7/domain=0x2,offset=$offset,starting_index=10/"

          perf stat $PERF_ARGS -x ' ' perf stat $PERF_ARGS -x ' ' -e
  $EVENT ls

          count=)
  done

  ==== Script 2
  #!/bin/bash

  offset=0;

  PERF_ARGS="-r 10 -C 0"
  while [ $offset -lt 8192 ]; do

          EVENT="hv_24x7/domain=0x2,offset=$offset,starting_index=10/"

          perf stat $PERF_ARGS -x ' ' perf stat $PERF_ARGS -x ' ' -e
  $EVENT ls

          offset=)
  done

  After few iterations I hit the following BUG.

  tt2.sh  tt.sh                                                                 
  
  tt2.sh  tt.sh                                                                 
  
  tt2.sh  tt.sh                                                                 
  
  275679187521558  hv_24x7/domain=0x2,offset=6848,starting_index=10/ 0.00%      
  
  tt2.sh  tt.sh                                                                 
  
  [ 4657.314709] softirq: huh, entered softirq 7 SCHED c00000000010abc0 with 
preem
  pt_count 00000100, exited with bfff0000?                                      
  
  [ 4657.314727] kernel BUG at 
/build/buildd/linux-3.16.0/kernel/irq_work.c:157!  
  [ 4657.314732] Oops: Exception in kernel mode, sig: 5 [#1]                    
  
  [ 4657.314740] Modules linked in: rtc_generic pseries_rng                     
  
  [ 4657.314749] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-25-generic 
#33-U
  [ 4657.314755] task: c000000001375e00 ti: c0000000013d0000 task.ti: 
c0000000013d0000
  [ 4657.314759] NIP: c0000000001e8ffc LR: c00000000001fe70 CTR: 
c000000000002800ic)
  [ 4657.314770] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28042024  
XER: 0000000a
  [ 4657.314782] CFAR: c00000000001fe6c SOFTE: 0                                
  
  GPR04: 0000000000000010 00000000009c0000 c000000001424a98 0000000000000002    
  
  GPR12: 8000000000009033 c00000000e9a0000 0000000006a3fcd0 0000000000000060    
  
  GPR16: 0000000000200000 0000000000000000 c000000000e57c00 0000000000000000    
  
  GPR20: c000000001595dca c000000001595478 0000000000000001 000000000000ffff    
  
  GPR28: c000000000e40380 c000000000e40300 c0000000013d3590 c000000000e56f08    
  
  [ 4657.314832] NIP [c0000000001e8ffc] irq_work_run+0x1c/0x30                  
  
  [ 4657.314841] Call Trace:                                                    
  
  4000 (unreliable)                                                             
  
  [ 4657.314861] [c0000000013d34f0] [c00000000001ff90] 
timer_interrupt+0xa0/0xe0  
  [ 4657.314871] [c0000000013d3520] [c000000000002914] 
decrementer_common+0x114/0x180
  [ 4657.314884] --- Exception: 901 at arch_local_irq_restore+0x14/0x90         
  
  [ 4657.314896] [c0000000013d3810] [c00000000012ed08] vprintk_emit+0x3b8/0x660 
(u
  [ 4657.314908] [c0000000013d38e0] [c000000000a02650] printk+0x84/0x98         
  
  [ 4657.314918] [c0000000013d3910] [c0000000000b51b4] __do_softirq+0x1e4/0x410 
  
  [ 4657.314927] [c0000000013d3a00] [c0000000000b57b8] irq_exit+0xf8/0x1400
  [ 4657.314948] [c0000000013d3a60] [c000000000002c14] 
doorbell_super_common+0x114/0x180
  [ 4657.314963] --- Exception: a01 at plpar_hcall_norets+0x8c/0xdc             
  
  [ 4657.314963]     LR = check_and_cede_processor+0x34/0x5020/0x50 (unreliable)
  [ 4657.314997] [c0000000013d3df0] [c00000000084077c] 
cpuidle_enter_state+0x6c/0x140c0 
  [ 4657.315030] [c0000000013d3f00] [c000000000d63ea8] start_kernel+0x500/0x51c 
  
  [ 4657.315047] Instruction dump:                                              
  
  [ 4657.315052] eba1ffe8 7c0803a6 ebc1fff0 ebe1fff8 4e800020 3c4c011f 3842c110 
78290464
  [ 4657.315068] 81290014 752a000f 7d380026 55291ffe <0b090000> 4bfffec8 
60000000 
  60000000                                                                      
  
  [ 4657.315090] ---[ end trace ee202cccd2211e5d ]---                           
  
  [ 4657.320224]                                                                
  
  [ 4657.362675] Unable to handle kernel paging request for data at address 
0xc000
  000b35515048                                                                  
  
  [ 4657.362680] Faulting instruction address: 0xc00000000006a37c               
  
  [ 4657.362684] Oops: Kernel access of bad area, sig: 11 [#2]                  
  
  [ 4657.362686] SMP NR_CPUS=2048 NUMA pSeries                                  
  
  [ 4657.362695] CPU: 12 PID: 7 Comm: rcu_sched Tainted: G      D       
3.16.0-25-
  [ 4657.362699] task: c0000000eb581540 ti: c0000000eb604000 task.ti: 
c0000000eb60
  [ 4657.362703] NIP: c00000000006a37c LR: c0000000000865a8 CTR: 
c00000000006a340 
  [ 4657.362706] REGS: c0000000eb607800 TRAP: 0300   Tainted: G      D        
(3.16.0-25-generic)
  00000000                                                                      
  
  [ 4657.362718] CFAR: c0000000000865a4 DAR: c000000b35515048 DSISR: 40000000 
SOFTE: 0
  GPR00: c0000000000865a8 c0000000eb607a80 c0000000013d50f0 00000000013d30d0    
  
  GPR08: 0000000000cc0000 c000000b35515000 c00000000e9a0000 0000000000000000    
  
  GPR12: c00000000006a340 c00000000e9a6c00 0000000000000000 0000000000000001    
  
  GPR20: 0000000000000000 c000000001389700 0000000000000000 0000000000000001    
  
  GPR28: c000000001420a68 0000000000000000 00000000013d30d0 0000000000000001    
  
  [ 4657.362758] NIP [c00000000006a37c] icp_hv_cause_ipi+0x3c/0xc0              
  
  [ 4657.362762] LR [c0000000000865a8] pSeries_cause_ipi_mux+0x88/0xc0          
  
  [ 4657.362765] Call Trace:                                                    
  
  0 (unreliable)                                                                
  
  [ 4657.362774] [c0000000eb607af0] [c0000000000865a8] 
pSeries_cause_ipi_mux+0x88/0xc0
  [ 4657.362778] [c0000000eb607b20] [c0000000000426f0] 
smp_muxed_ipi_message_pass+
  0x70/0x90
  [ 4657.362783] [c0000000eb607b60] [c0000000000f3a58] resched_task+0x118/0x140 
  
  [ 4657.362786] [c0000000eb607b90] [c0000000000f3da0] resched_cpu+0xc0/0x110   
  
  [ 4657.362791] [c0000000eb607be0] [c00000000013f170] 
rcu_implicit_dynticks_qs+0x200/0x230
  [ 4657.362795] [c0000000eb607c10] [c00000000013de1c] force_qs_rnp+0x14c/0x250 
  
  [ 4657.362799] [c0000000eb607c90] [c0000000001407f0] 
rcu_gp_kthread+0x430/0x8e0 
  [ 4657.362803] [c0000000eb607d80] [c0000000000e0820] kthread+0x110/0x130      
  
  [ 4657.362807] [c0000000eb607e30] [c00000000000a468] 
ret_from_kernel_thread+0x5c/0x74
  [ 4657.362810] Instruction dump:                                              
  
  [ 4657.362812] fbc1fff0 fbe1fff8 f8010010 f821ff91 7c7e1b78 60000000 60000000 
3d220008
  [ 4657.362818] 39493f00 1d3e0900 e94a0000 7d2a4a14 <abe90048> 7c0004ac 
3860006c
  7fe4fb78
  [ 4657.362825] ---[ end trace ee202cccd2211e5e ]---                           
  
  [ 4657.365085]                                                                
  
  [ 4659.320264] Kernel panic - not syncing: Attempted to kill the idle task!   
  
  [ 4659.325500] ---[ end Kernel panic - not syncing: Attempted to kill the 
idle task!

  Backported following 4 commits/patches from upstream[1]:

          1. commit d658972
          Author: Himangi Saraogi <himangi...@gmail.com>
          Date:   Tue Jul 22 23:40:19 2014 +0530

              powerpc/perf/hv-24x7: Use kmem_cache_free

          2. commit 48bee8a
          Author: Cody P Schafer <d...@codyps.com>
          Date:   Tue Sep 30 23:03:17 2014 -0700
   
                powerpc/perf/hv-24x7: use kmem_cache instead of aligned stack 
allocations

          3. https://lkml.org/lkml/2014/12/10/613
          4. https://lkml.org/lkml/2014/12/10/36

  to the vivid kernel[2]. The problem does not repro.

  Will Canonical cherry-pick those commits or should we backport ?
  (they apply without conflicts).

  [1] The patches 3 and 4 above were posted recently, Powerpc
        maintainer plans to merge them.

  [2] git://kernel.ubuntu.com/ubuntu/ubuntu-vivid.git

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1410519/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to