Bug#671895: [sparc] Kernel NULL pointer dereference in sungem/gem_poll() (Re: updates)

2013-08-20 Thread Moritz Muehlenhoff
reassign 671895 src:linux
thanks

On Tue, May 22, 2012 at 07:26:22PM -0300, gustavo panizzo gfa wrote:
 On Fri, May 11, 2012 at 11:04:22PM +0100, Jurij Smakov wrote:
 [snip]
 
  
  Only two non-trivial things here: execution of ethtool_lite(if_name) 
  and invocation of arping. I would put my money on the former (defined 
  in ethtool_lite.c), because it uses low-level ioctls to query the 
  interface state.
  
  You can test whether running it would trigger a failure on your 
  machine by downloading ethtool_lite.c and building it as a standalone 
  binary, the following commands appear to do the trick:
  
  $ sudo apt-get build-dep netcfg
  [...]
  $ gcc -o ethtool-lite -DTEST ethtool-lite.c -ldebconfclient 
  -ldebian-installer
  $ sudo ./ethtool-lite eth0
  ethtool-lite: eth0 is connected.
  $
  
  If that triggers a null pointer exception on your machine (try it both 
  with and without network brought up and check dmesg afterwards), we 
  will be in a very good position to report it upstream for fixing.
 i cannot repeat the issue using ethtool-lite (or arping) while booting
 from disk, i can repeat the issue booting from network (22/05/2012
 image) running netcfg or udhcp
 
 
 also i can repeat the issue running 
 ~ # ip link set dev eth0 up
 while the cable is plugged in, or running the command and plugging the
 cable later
 
 if i (after getting the netimage) remove the link on eth0 and plug
 eth1, installer works fine

Does this still occur with current kernels?

Cheers,
   Moritz


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#671895: [sparc] Kernel NULL pointer dereference in sungem/gem_poll() (Re: updates)

2012-05-22 Thread gustavo panizzo gfa
On Fri, May 11, 2012 at 11:04:22PM +0100, Jurij Smakov wrote:
[snip]

 
 Only two non-trivial things here: execution of ethtool_lite(if_name) 
 and invocation of arping. I would put my money on the former (defined 
 in ethtool_lite.c), because it uses low-level ioctls to query the 
 interface state.
 
 You can test whether running it would trigger a failure on your 
 machine by downloading ethtool_lite.c and building it as a standalone 
 binary, the following commands appear to do the trick:
 
 $ sudo apt-get build-dep netcfg
 [...]
 $ gcc -o ethtool-lite -DTEST ethtool-lite.c -ldebconfclient -ldebian-installer
 $ sudo ./ethtool-lite eth0
 ethtool-lite: eth0 is connected.
 $
 
 If that triggers a null pointer exception on your machine (try it both 
 with and without network brought up and check dmesg afterwards), we 
 will be in a very good position to report it upstream for fixing.
i cannot repeat the issue using ethtool-lite (or arping) while booting
from disk, i can repeat the issue booting from network (22/05/2012
image) running netcfg or udhcp


also i can repeat the issue running 
~ # ip link set dev eth0 up
while the cable is plugged in, or running the command and plugging the
cable later

if i (after getting the netimage) remove the link on eth0 and plug
eth1, installer works fine

-- 
1AE0 322E B8F7 4717 BDEA  BF1D 44BB 1BA7 9F6C 6333



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#671895: [sparc] Kernel NULL pointer dereference in sungem/gem_poll() (Re: updates)

2012-05-12 Thread Ben Hutchings
On Fri, 2012-05-11 at 12:25 -0300, gustavo panizzo wrote:
 adding debian-boot
 
 
 i've installed unstable on the box (using debootstrap) and it boots
 3.2.0-2-sparc64 sucessfully, networking works
 
 obp diags shows no errors
 
 but when i boot from network using 
 http://d-i.debian.org/daily-images/sparc/daily/netboot/boot.img 11-05-2012
 
 i get the following error
 
   ┌───┤ Detecting link on eth0; please wait... ├┐
   │ │
   │  100% [  
 246.994391] Unable to handle kernel NULL pointer dereference
 247.074490] tsk-{mm,active_mm}-context = 019f │
 14;10H[  247.164534] tsk-{mm,active_mm}-pgd = f8001d48c000│
 [  247.240508] Kernel panic - not syncing: Aiee, killing interrupt handler! │
 [  247.328648] Call Trace:  │
 [  247.360793]  [0045dcd4] do_exit+0x94/0x708   │
 [  247.423821]  [00427550] die_if_kernel+0x2a0/0x2c8┘
 [  247.494864]  [00768c84] unhandled_fault+0x8c/0x98
 [  247.565915]  [0076936c] do_sparc64_fault+0x6dc/0x780
 [  247.640377]  [00407880] sparc64_realfault_common+0x10/0x20
 [  247.721722]  [10015680] gem_poll+0x9fc/0x1328 [sungem]
[...]

This means we crashed:

 static __inline__ void gem_tx(struct net_device *dev, struct gem *gp, u32 
 gem_status)
 {
   int entry, limit;
 
   entry = gp-tx_old;
   limit = ((gem_status  GREG_STAT_TXNR)  GREG_STAT_TXNR_SHIFT);
   while (entry != limit) {
   struct sk_buff *skb;
   struct gem_txd *txd;
   dma_addr_t dma_addr;
   u32 dma_len;
   int frag;
 
   if (netif_msg_tx_done(gp))
   printk(KERN_DEBUG %s: tx done, slot %d\n,
   gp-dev-name, entry);
   skb = gp-tx_skbs[entry];
   if (skb_shinfo(skb)-nr_frags) {

right here, while evaluating skb_shinfo(skb).  Which probably means skb
was null.  This *could* be due to broken hardware telling us that more
packets were sent then we actually queued, but probably not since
'networking works' when not using netboot.

Is the driver successfully resetting the network controller while
net-booting?  It can time-out and will then log SW reset is ghetto but
will *not* abort initialisation.

Ben.

-- 
Ben Hutchings
Experience is directly proportional to the value of equipment destroyed.
 - Carolyn Scheppner


signature.asc
Description: This is a digitally signed message part


Bug#671895: [sparc] Kernel NULL pointer dereference in sungem/gem_poll() (Re: updates)

2012-05-11 Thread gustavo panizzo gfa
adding debian-boot


i've installed unstable on the box (using debootstrap) and it boots
3.2.0-2-sparc64 sucessfully, networking works

obp diags shows no errors

but when i boot from network using 
http://d-i.debian.org/daily-images/sparc/daily/netboot/boot.img 11-05-2012

i get the following error

  ┌───┤ Detecting link on eth0; please wait... ├┐
  │ │
  │  100% [  
246.994391] Unable to handle kernel NULL pointer dereference
247.074490] tsk-{mm,active_mm}-context = 019f │
14;10H[  247.164534] tsk-{mm,active_mm}-pgd = f8001d48c000│
[  247.240508] Kernel panic - not syncing: Aiee, killing interrupt handler! │
[  247.328648] Call Trace:  │
[  247.360793]  [0045dcd4] do_exit+0x94/0x708   │
[  247.423821]  [00427550] die_if_kernel+0x2a0/0x2c8┘
[  247.494864]  [00768c84] unhandled_fault+0x8c/0x98
[  247.565915]  [0076936c] do_sparc64_fault+0x6dc/0x780
[  247.640377]  [00407880] sparc64_realfault_common+0x10/0x20
[  247.721722]  [10015680] gem_poll+0x9fc/0x1328 [sungem]
[  247.798478]  [00697110] net_rx_action+0x9c/0x234
[  247.868369]  [004607f0] __do_softirq+0xdc/0x1c4
[  247.937125]  [0042a76c] do_softirq+0x54/0x80
[  248.002442]  [00460a6c] irq_exit+0x38/0x94
[  248.065474]  [0042df38] timer_interrupt+0x90/0xa8
[  248.136516]  [004209d4] tl0_irq14+0x14/0x20
[  248.200692]  [0049e764] touch_softlockup_watchdog+0x4/0xc
[  248.280888]  [008f07e4] start_kernel+0x390/0x3a0
[  248.350783]  [00750b88] tlb_fixup_done+0x80/0x88
[  248.420672]  []   (null)
[  248.481416] Press Stop-A (L1-A) to return to the boot prom



-- 
1AE0 322E B8F7 4717 BDEA  BF1D 44BB 1BA7 9F6C 6333



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#671895: [sparc] Kernel NULL pointer dereference in sungem/gem_poll() (Re: updates)

2012-05-11 Thread Jurij Smakov
On Fri, May 11, 2012 at 12:25:01PM -0300, gustavo panizzo gfa wrote:
 adding debian-boot
 
 
 i've installed unstable on the box (using debootstrap) and it boots
 3.2.0-2-sparc64 sucessfully, networking works
 
 obp diags shows no errors
 
 but when i boot from network using 
 http://d-i.debian.org/daily-images/sparc/daily/netboot/boot.img 11-05-2012
 
 i get the following error
 
   ┌───┤ Detecting link on eth0; please wait... ├┐
   │ │
   │  100% [  
 246.994391] Unable to handle kernel NULL pointer dereference
 247.074490] tsk-{mm,active_mm}-context = 019f │
 14;10H[  247.164534] tsk-{mm,active_mm}-pgd = f8001d48c000│
 [  247.240508] Kernel panic - not syncing: Aiee, killing interrupt handler! │
 [  247.328648] Call Trace:  │
 [  247.360793]  [0045dcd4] do_exit+0x94/0x708   │
 [  247.423821]  [00427550] die_if_kernel+0x2a0/0x2c8┘
 [  247.494864]  [00768c84] unhandled_fault+0x8c/0x98
 [  247.565915]  [0076936c] do_sparc64_fault+0x6dc/0x780
 [  247.640377]  [00407880] sparc64_realfault_common+0x10/0x20
 [  247.721722]  [10015680] gem_poll+0x9fc/0x1328 [sungem]
 [  247.798478]  [00697110] net_rx_action+0x9c/0x234
 [  247.868369]  [004607f0] __do_softirq+0xdc/0x1c4
 [  247.937125]  [0042a76c] do_softirq+0x54/0x80
 [  248.002442]  [00460a6c] irq_exit+0x38/0x94
 [  248.065474]  [0042df38] timer_interrupt+0x90/0xa8
 [  248.136516]  [004209d4] tl0_irq14+0x14/0x20
 [  248.200692]  [0049e764] touch_softlockup_watchdog+0x4/0xc
 [  248.280888]  [008f07e4] start_kernel+0x390/0x3a0
 [  248.350783]  [00750b88] tlb_fixup_done+0x80/0x88
 [  248.420672]  []   (null)
 [  248.481416] Press Stop-A (L1-A) to return to the boot prom

Interesting, so we are doing something funky during link detection to 
trip this bug. The code which does it is in netcfg:

http://anonscm.debian.org/gitweb/?p=d-i/netcfg.git;a=tree

Here's the relevant code from netcfg-common.c:

1277 debconf_capb(client, progresscancel);
1278 debconf_subst(client, netcfg/link_detect_progress, interface, 
if_name);
1279 debconf_progress_start(client, 0, 100, netcfg/link_detect_progress);
1280 for (count = 0; count  link_waits; count++) {
1281 usleep(25);
1282 if (debconf_progress_set(client, 50 * count / link_waits) == 30) {
1283 /* User cancelled on us... bugger */
1284 rv = 0;
1285 break;
1286 }
1287 if (ethtool_lite(if_name) == 1) /* ethtool-lite's CONNECTED */ {
1288 if (gateway.s_addr  !is_wireless_iface(if_name)) {
1289 for (count = 0; count  gw_tries; count++) {
1290 if (di_exec_shell_log(arping) == 0)
1291 break;
1292 if (debconf_progress_set(client, 50 + 50 * count / 
gw_tries) == 30)
1293 break;
1294 }
1295 }
1296 rv = 1;
1297 break;
1298 }
1299 debconf_progress_set(client, 100);
1300 }

Only two non-trivial things here: execution of ethtool_lite(if_name) 
and invocation of arping. I would put my money on the former (defined 
in ethtool_lite.c), because it uses low-level ioctls to query the 
interface state.

You can test whether running it would trigger a failure on your 
machine by downloading ethtool_lite.c and building it as a standalone 
binary, the following commands appear to do the trick:

$ sudo apt-get build-dep netcfg
[...]
$ gcc -o ethtool-lite -DTEST ethtool-lite.c -ldebconfclient -ldebian-installer
$ sudo ./ethtool-lite eth0
ethtool-lite: eth0 is connected.
$

If that triggers a null pointer exception on your machine (try it both 
with and without network brought up and check dmesg afterwards), we 
will be in a very good position to report it upstream for fixing.

Best regards,
-- 
Jurij Smakov   ju...@wooyd.org
Key: http://www.wooyd.org/pgpkey/  KeyID: C99E03CC



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#671895: [sparc] Kernel NULL pointer dereference in sungem/gem_poll() (Re: updates)

2012-05-11 Thread gustavo panizzo gfa


Jurij Smakov ju...@wooyd.org wrote:


If that triggers a null pointer exception on your machine (try it both 
with and without network brought up and check dmesg afterwards), we 
will be in a very good position to report it upstream for fixing.

i will be checking it next week 
-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#671895: [sparc] Kernel NULL pointer dereference in sungem/gem_poll() (Re: updates)

2012-05-10 Thread gustavo panizzo gfa
 Interesting.  How does a 3.2.y kernel behave with the ancient gentoo
 userland?  (Perhaps this is what you are planning to try later.)

Settings for eth0:
Supported ports: [ TP MII ]
Supported link modes:   10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: MII
PHYAD: 0
Transceiver: external
Auto-negotiation: on
Supports Wake-on: d
Wake-on: d
Current message level: 0x0007 (7)
Link detected: yes

kernel is 3.2.15 taken out from apt-get linux-source-3.2
config is the same gentoo config

i cannot get to boot linux-image-3.2.0-2-sparc64_3.2.16-1_sparc due to not 
being able to mount root fs
i see this errors on kernel log

[   52.363317] sun_esp: Unknown symbol scsi_esp_register (err 0)
[   52.439003] sun_esp: Unknown symbol scsi_esp_intr (err 0)
[   52.509998] sun_esp: Unknown symbol scsi_host_put (err 0)
[   52.581304] sun_esp: Unknown symbol scsi_esp_template (err 0)
[   52.656890] sun_esp: Unknown symbol scsi_esp_unregister (err 0)
[   52.734804] sun_esp: Unknown symbol scsi_esp_cmd (err 0)
[   52.804672] sun_esp: Unknown symbol scsi_host_alloc (err 0)
[   53.004224] SCSI subsystem initialized

i will continue to experiment with this kernel (hopefully debootstrap will 
finish soon)

-- 
1AE0 322E B8F7 4717 BDEA  BF1D 44BB 1BA7 9F6C 6333



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#671895: [sparc] Kernel NULL pointer dereference in sungem/gem_poll() (Re: updates)

2012-05-09 Thread Jonathan Nieder
Hi Gustavo,

gustavo panizzo wrote:

 i can get the nic to work using latest linus tree
 + ancient gentoo userland (udev 124), but is running at 10Mb/s half duplex  

 3.4.0-rc6+
 Settings for eth0:
 Supported ports: [ TP MII ]
 Supported link modes:   10baseT/Half 10baseT/Full 
 100baseT/Half 100baseT/Full 
[...]
 Advertised auto-negotiation: Yes
 Speed: 10Mb/s
 Duplex: Half
[...]
 Auto-negotiation: off
[...]
 while 2.6.28 runs at 100Mb/s full duplex

 Settings for eth0:
 Supported ports: [ TP MII ]
 Supported link modes:   10baseT/Half 10baseT/Full 
 100baseT/Half 100baseT/Full 
[...]
 Advertised auto-negotiation: No
 Speed: 100Mb/s
 Duplex: Full
[...]
 Auto-negotiation: on
[...]
 i will try latter with kernel from d-i or testing, but i think this
 sould go upstream

Interesting.  How does a 3.2.y kernel behave with the ancient gentoo
userland?  (Perhaps this is what you are planning to try later.)



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org