Re: [PATCH net] net: sungem: fix rx checksum support
> BTW, removing the FCS also means GRO is going to work, finally on this NIC ;) > > GRO does not like packets with padding. As a follow-up, I am seeing hw csum failures on Sun V440 that has onboard Sun Cassini with sungem driver. First tested version was 4.18 (it happened there once) and now that I tried 4.18+git, it still happens: [ 21.563282] libphy: Fixed MDIO Bus: probed [ 21.617116] cassini: cassini.c:v1.6 (21 May 2008) [ 21.678962] cassini :00:02.0: enabling device (0144 -> 0146) [ 21.761931] cassini :00:02.0 eth0: Sun Cassini+ (64bit/66MHz PCI/Cu) Ethernet[6] 00:03:ba:6f:14:39 [ 21.884952] cassini 0003:00:01.0: enabling device (0144 -> 0146) [ 21.967868] cassini 0003:00:01.0 eth1: Sun Cassini+ (64bit/66MHz PCI/Cu) Ethernet[29] 00:03:ba:6f:14:3a [...] [ 54.341212] eth0: hw csum failure [ 54.384725] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.18.0-12952-g2923b27 #1397 [ 54.483167] Call Trace: [ 54.515209] [0077838c] __skb_checksum_complete+0xcc/0xe0 [ 54.595272] [0080fc84] igmp_rcv+0x224/0x920 [ 54.660475] [007ca3d0] ip_local_deliver+0xb0/0x240 [ 54.733675] [007ca5c0] ip_rcv+0x60/0xa0 [ 54.794304] [00781a30] __netif_receive_skb_one_core+0x30/0x60 [ 54.880094] [00782914] process_backlog+0x94/0x140 [ 54.952161] [00788f6c] net_rx_action+0x1ec/0x320 [ 55.023083] [00870de8] __do_softirq+0xc8/0x200 [ 55.091719] [0042c4cc] do_softirq_own_stack+0x2c/0x40 [ 55.168362] [004662d8] irq_exit+0xb8/0xe0 [ 55.231266] [00870ac0] handler_irq+0xc0/0x100 [ 55.298756] [004208b4] tl0_irq5+0x14/0x20 [ 55.361670] [0042cafc] arch_cpu_idle+0x9c/0xa0 [ 55.447055] [0048a254] cpu_startup_entry+0x14/0x40 [ 55.536998] [0095f4b4] 0x95f4b4 [ 55.588471] [4000] 0x4000 [ 179.780371] eth0: hw csum failure [ 179.823878] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.18.0-12952-g2923b27 #1397 [ 179.922230] Call Trace: [ 179.954267] [0077838c] __skb_checksum_complete+0xcc/0xe0 [ 180.034335] [0080fc84] igmp_rcv+0x224/0x920 [ 180.099536] [007ca3d0] ip_local_deliver+0xb0/0x240 [ 180.172740] [007ca5c0] ip_rcv+0x60/0xa0 [ 180.233368] [00781a30] __netif_receive_skb_one_core+0x30/0x60 [ 180.319159] [00782914] process_backlog+0x94/0x140 [ 180.391225] [00788f6c] net_rx_action+0x1ec/0x320 [ 180.462148] [00870de8] __do_softirq+0xc8/0x200 [ 180.530782] [0042c4cc] do_softirq_own_stack+0x2c/0x40 [ 180.607422] [004662d8] irq_exit+0xb8/0xe0 [ 180.670331] [00870ac0] handler_irq+0xc0/0x100 [ 180.737822] [004208b4] tl0_irq5+0x14/0x20 [ 180.800735] [0042caf8] arch_cpu_idle+0x98/0xa0 [ 180.869373] [00489f60] do_idle+0xe0/0x1c0 [ 180.932281] [0048a25c] cpu_startup_entry+0x1c/0x40 [ 181.005491] [0098e9b4] start_kernel+0x3b8/0x3c8 -- Meelis Roos (mr...@linux.ee)
Re: Invalid sk_policy[] access
> > Indeed, the kernel is 64-bit in both cases. > > And the userland bit-arity has no relevance whatsoever for this bug. > > hang on; The sizeof (and offsetof) values I listed were obtained either > from /usr/bin/crash (on the T5) or from simple printk's of the structures > in the case of the v440. And they *are* different, and the numbers Since there are no config-dependent difference in the struct, maybe it's a compiler version difference for padding/optimization instead? -- Meelis Roos (mr...@linux.ee)
Re: Invalid sk_policy[] access
> > Indeed, the kernel is 64-bit in both cases. > > And the userland bit-arity has no relevance whatsoever for this bug. > > hang on; The sizeof (and offsetof) values I listed were obtained either > from /usr/bin/crash (on the T5) or from simple printk's of the structures > in the case of the v440. And they *are* different, and the numbers > match the values dumped on the console on pnaic. So isnt there actually > a problem here? There certianly seems to be a problem - that's how you ended up looking under that rock. Maybe the offsets are different because of different kernel config? -- Meelis Roos (mr...@linux.ee)
Re: [PATCH net-next v2] rhashtable: Allow other tasks to be scheduled in large lookup loops
Depending on system speed, the large lookup/insert/delete loops of the testsuite can take a considerable amount of time to complete causing watchdog warnings to appear. Allow other tasks to be scheduled throughout the loops. Reported-by: Meelis Roos mr...@linux.ee Signed-off-by: Thomas Graf tg...@suug.ch --- v2: Use cond_resched() instead schedule() Tested it. The warning is gone from rhashtable test but now it is present in rbtree test (it was not there before). Same kernel, just your patch applied - but it should not change rbtree test??? [0.00] PROMLIB: Sun IEEE Boot Prom 'OBP 3.31.0 2001/07/25 20:36' [0.00] PROMLIB: Root node compatible: [0.00] Linux version 4.2.0-rc2-00077-gf760b87-dirty (mroos@u5) (gcc version 4.9.3 (Debian 4.9.3-1) ) #21 Fri Jul 17 20:15:21 EEST 2015 [0.00] bootconsole [earlyprom0] enabled [0.00] ARCH: SUN4U [0.00] Ethernet address: 08:00:20:f8:c7:72 [0.00] MM: PAGE_OFFSET is 0xf800 (max_phys_bits == 40) [0.00] MM: VMALLOC [0x0001 -- 0x0600] [0.00] MM: VMEMMAP [0x0600 -- 0x0c00] [0.00] Kernel: Using 10 locked TLB entries for main kernel image. [0.00] Remapping the kernel... done. [0.00] kmemleak: Kernel memory leak detector disabled [0.00] OF stdout device is: /pci@1f,0/pci@1,1/ebus@1/se@14,40:a [0.00] PROM: Built device tree with 70266 bytes of memory. [0.00] Top of RAM: 0x1ff2c000, Total RAM: 0x1ff2a000 [0.00] Memory hole size: 0MB [0.00] Allocated 16384 bytes for kernel page tables. [0.00] Zone ranges: [0.00] Normal [mem 0x-0x1ff2bfff] [0.00] Movable zone start for each node [0.00] Early memory node ranges [0.00] node 0: [mem 0x-0x1fefdfff] [0.00] node 0: [mem 0x1ff0-0x1ff2bfff] [0.00] Initmem setup node 0 [mem 0x-0x1ff2bfff] [0.00] On node 0 totalpages: 65429 [0.00] Normal zone: 512 pages used for memmap [0.00] Normal zone: 0 pages reserved [0.00] Normal zone: 65429 pages, LIFO batch:15 [0.00] Booting Linux... [0.00] CPU CAPS: [flush,stbar,swap,muldiv,v9,mul32,div32,v8plus] [0.00] CPU CAPS: [vis] [0.00] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768 [0.00] pcpu-alloc: [0] 0 [0.00] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 64917 [0.00] Kernel command line: root=/dev/sda1 ro [0.00] PID hash table entries: 2048 (order: 1, 16384 bytes) [0.00] Dentry cache hash table entries: 65536 (order: 6, 524288 bytes) [0.00] Inode-cache hash table entries: 32768 (order: 5, 262144 bytes) [0.00] Sorting __ex_table... [0.00] Memory: 475912K/523432K available (5270K kernel code, 516K rwdata, 1672K rodata, 520K init, 30210K bss, 47520K reserved, 0K cma-reserved) [0.00] Running RCU self tests [0.00] Testing tracer nop: PASSED [0.00] NR_IRQS:2048 nr_irqs:2048 1 [ 26.882478] clocksource: tick: mask: 0x max_cycles: 0x5306eb473f, max_idle_ns: 440795213232 ns [ 26.986192] clocksource: mult[2c71c72] shift[24] [ 27.025729] clockevent: mult[5c28f5c3] shift[32] [ 27.067997] Console: colour dummy device 80x25 [ 27.104149] console [tty0] enabled [ 27.128868] bootconsole [earlyprom0] disabled [ 27.165340] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar [ 27.165405] ... MAX_LOCKDEP_SUBCLASSES: 8 [ 27.165445] ... MAX_LOCK_DEPTH: 48 [ 27.165486] ... MAX_LOCKDEP_KEYS:8191 [ 27.165529] ... CLASSHASH_SIZE: 4096 [ 27.165574] ... MAX_LOCKDEP_ENTRIES: 32768 [ 27.165617] ... MAX_LOCKDEP_CHAINS: 65536 [ 27.165662] ... CHAINHASH_SIZE: 32768 [ 27.165706] memory used by lock dependency info: 8159 kB [ 27.165756] per task-struct memory footprint: 1920 bytes [ 27.165802] [ 27.165838] | Locking API testsuite: [ 27.165873] [ 27.165932] | spin |wlock |rlock |mutex | wsem | rsem | [ 27.165993] -- [ 27.166092] A-A deadlock: ok | ok | ok | ok | ok | ok | [ 27.232682] A-B-B-A deadlock: ok | ok | ok | ok | ok | ok | [ 27.299789] A-B-B-C-C-A deadlock: ok | ok | ok | ok | ok | ok | [ 27.367295] A-B-C-A-B-C deadlock: ok | ok | ok | ok | ok | ok | [ 27.434877] A-B-B-C-C-D-D-A deadlock: ok | ok | ok | ok | ok | ok | [ 27.502857] A-B-C-D-B-D-D-A deadlock: ok | ok | ok | ok | ok
Re: [PATCH v2] rhashtable: fix for resize events during table walk
If rhashtable_walk_next detects a resize operation in progress, it jumps to the new table and continues walking that one. But it misses to drop the reference to it's current item, leading it to continue traversing the new table's bucket in which the current item is sorted into, and after reaching that bucket's end continues traversing the new table's second bucket instead of the first one, thereby potentially missing items. This fixes the rhashtable runtime test for me. Bug probably introduced by Herbert Xu's patch eddee5ba (rhashtable: Fix walker behaviour during rehash) although not explicitly tested. Fixes: eddee5ba (rhashtable: Fix walker behaviour during rehash) Signed-off-by: Phil Sutter p...@nwl.cc Yes, this fixes the error, thank you. The new problem with the test - soft lockup - CPU#0 stuck for 22s! is still there on 360 MHz UltraSparc IIi. I understand it is harmless but is there some easy way to make the test avoid NMI watchdog? [ 58.374173] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper:1] [ 58.374293] Modules linked in: [ 58.374387] irq event stamp: 144 [ 58.374461] hardirqs last enabled at (143): [00404b1c] rtrap_xcall+0x18/0x20 [ 58.374621] hardirqs last disabled at (144): [00426b28] sys_call_table+0x5ac/0x744 [ 58.374788] softirqs last enabled at (142): [0045f5fc] __do_softirq+0x4fc/0x680 [ 58.374958] softirqs last disabled at (135): [0042be28] do_softirq_own_stack+0x28/0x40 [ 58.375148] CPU: 0 PID: 1 Comm: swapper Not tainted 4.2.0-rc2-00077-gf760b87 #20 [ 58.375248] task: f8001f09ef60 ti: f8001f0fc000 task.ti: f8001f0fc000 [ 58.375348] TSTATE: 004480001601 TPC: 0049663c TNPC: 00496640 Y: Not tainted [ 58.375497] TPC: lock_is_held+0x3c/0x60 [ 58.375579] g0: 00b1d000 g1: 0002 g2: 00a88000 g3: 007f [ 58.375699] g4: f8001f09ef60 g5: g6: f8001f0fc000 g7: 2f23003c7b80 [ 58.375817] o0: o1: 0002 o2: 0620 o3: f8001f09f560 [ 58.375937] o4: f8001f09ef60 o5: 0002 sp: f8001f0ff041 ret_pc: 0049662c [ 58.376069] RPC: lock_is_held+0x2c/0x60 [ 58.376152] l0: f8001f09ef60 l1: 0189bc00 l2: 00b1d000 l3: 0028 [ 58.376272] l4: f8001f09f538 l5: 0008 l6: l7: 0014 [ 58.376388] i0: 018d1428 i1: 01318d18 i2: 05f8 i3: [ 58.376506] i4: i5: 0001 i6: f8001f0ff0f1 i7: 007022d8 [ 58.376643] I7: lockdep_rht_mutex_is_held+0x18/0x40 [ 58.376715] Call Trace: [ 58.376798] [007022d8] lockdep_rht_mutex_is_held+0x18/0x40 [ 58.376917] [00b7a6ac] test_rht_lookup.constprop.10+0x32c/0x4ac [ 58.377029] [00b7afd0] test_rhashtable.constprop.8+0x7a4/0x1100 [ 58.377138] [00b7ba00] test_rht_init+0xd4/0x148 [ 58.377240] [00426e2c] do_one_initcall+0xec/0x1e0 [ 58.377351] [00b58b60] kernel_init_freeable+0x114/0x1c4 [ 58.377469] [0091c1ec] kernel_init+0xc/0x100 [ 58.377577] [00405fe4] ret_from_fork+0x1c/0x2c [ 58.377663] [] (null) -- Meelis Roos (mr...@linux.ee) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html