Re: [PATCH] net: dev_forward_skb(): Scrub packet's per-netns info only when crossing netns
On Tue, 20 Mar 2018 18:39:47 +0200, Liran Alon said: > What is your opinion in regards if it's OK to put the flag enabling this > "fix" in /proc/sys/net/core? Do you think it's sufficient? Umm.. *which* /proc/sys/net/core? These could differ for things that are in different namespaces. Or are you proposing one systemwide global value (which also gets "interesting" if it's writable inside a container and changes the behavior a different container sees...) pgpph7UJ0wG4V.pgp Description: PGP signature
Re: linux-next: ip6tables *broken* - last base chain position %u doesn't match underflow %u (hook %u
On Tue, 20 Mar 2018 16:48:42 +0100, Florian Westphal said: > valdis.kletni...@vt.eduwrote: > > (Resending because I haven't heard anything) > [ ip6tables broken ] > > Sorry, did not see this email before. > > I'll investigate asap, thanks for the detailed report. No problem, it reverts cleanly and looks like it's 4.17 material, and finding stuff like this is why I build linux-next kernels :) Just remember to stick a Reported-By: on the fix :) pgpMRDbsiBV3S.pgp Description: PGP signature
linux-next: ip6tables *broken* - last base chain position %u doesn't match underflow %u (hook %u
(Resending because I haven't heard anything) Am hitting an issue with this commit: commit 0d7df906a0e78079a02108b06d32c3ef2238ad25 Author: Florian WestphalDate: Tue Feb 27 19:42:37 2018 +0100 netfilter: x_tables: ensure last rule in base chain matches underflow/policy This trips on my system: [ 64.402790] ip6_tables: last base chain position 1136 doesn't match underflow 1344 (hook 1) More annoyingly, the return value means that ip6tables aren't initialized so there's no firewall protection. (In other words, this: If a (syzkaller generated) ruleset doesn't have the underflow/policy stored as the last rule in the base chain, then iptables will abort() because it doesn't find the chain policy. ends up meaning iptables aborts anyhow. My iptables isn't syzkaller generated - it's mostly crufty vi-generated. ;) Messages generated as I tried to build smaller tables to narrow down the problem: (not sure where it gets the numbers from, as I reduced it from 50 lines down to 3 and no real correlation to the tables I was trying to load - in particular the numbers went up once and remained unchanged once, even though between each try I was whacking out another 5-10 lines...) [ 64.402790] ip6_tables: last base chain position 1136 doesn't match underflow 1344 (hook 1) [ 1897.914828] ip6_tables: last base chain position 928 doesn't match underflow 1136 (hook 1) [ 1954.032735] ip6_tables: last base chain position 720 doesn't match underflow 928 (hook 1) [ 2021.813719] ip6_tables: last base chain position 920 doesn't match underflow 1128 (hook 1) [ 2035.044103] ip6_tables: last base chain position 920 doesn't match underflow 1128 (hook 1) [ 2060.594412] ip6_tables: last base chain position 616 doesn't match underflow 824 (hook 1) I finally got /etc/sysconfig/ip6tables down to this: # Generated by ip6tables-save v1.6.2 on Thu Mar 8 08:20:04 2018 *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [208207395:46346275671] [120166037:34218429901] -A INPUT -i lo+ -j ACCEPT [129329499:129691207309] -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT [0:0] -A INPUT -j DROP COMMIT # Completed on Thu Mar 8 08:20:04 2018 About as minimal as it can get. :) Any ideas? pgpd6Tw9aHpAS.pgp Description: PGP signature
linux-next 20180307 - UBSAN whine in lib/radix-tree.c
Seen in the dmesg: [0.00] [0.00] UBSAN: Undefined behaviour in lib/radix-tree.c:123:14 [0.00] member access within null pointer of type 'const struct radix_tree_node' [0.00] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GT 4.16.0-rc4-next-20180307-dirty #559 [0.00] Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A20 05/08/2017 [0.00] Call Trace: [0.00] dump_stack+0x83/0xca [0.00] ubsan_epilogue+0xd/0x3a [0.00] handle_null_ptr_deref+0x85/0x90 [0.00] __ubsan_handle_type_mismatch_v1+0x5e/0x70 [0.00] __radix_tree_replace+0x1e4/0x1f0 [0.00] radix_tree_iter_replace+0x25/0x50 [0.00] idr_alloc_u32+0x166/0x1f0 [0.00] idr_alloc+0x7e/0xd0 [0.00] worker_pool_assign_id+0x61/0xd0 [0.00] ? mutex_lock_nested+0x1b/0x20 [0.00] workqueue_init_early+0x58a/0xc3f [0.00] start_kernel+0x4f7/0x809 [0.00] x86_64_start_reservations+0x40/0x61 [0.00] x86_64_start_kernel+0x7b/0x9e [0.00] secondary_startup_64+0xa5/0xb0 [0.00] not sure why a null 'parent' value got passed to get_slot_offset() in the first place, but it sounds like something is missing an 'if (NULL)' test... pgp_P3u0T20Z_.pgp Description: PGP signature
ip6tables - last base chain position %u doesn't match underflow %u (hook %u
Am hitting an issue with this commit: commit 0d7df906a0e78079a02108b06d32c3ef2238ad25 Author: Florian WestphalDate: Tue Feb 27 19:42:37 2018 +0100 netfilter: x_tables: ensure last rule in base chain matches underflow/policy This trips on my system: [ 64.402790] ip6_tables: last base chain position 1136 doesn't match underflow 1344 (hook 1) More annoyingly, the return value means that ip6tables aren't initialized so there's no firewall protection. My iptables isn't syzkaller generated - it's mostly crufty vi-generated. ;) Messages generated as I tried to build smaller tables to narrow down the problem: (not sure where it gets the numbers from, as I reduced it from 50 lines down to 3 and no real correlation to the tables I was trying to load - in particular the numbers went up once and remained unchanged once, even though between each try I was whacking out another 5-10 lines...) [ 64.402790] ip6_tables: last base chain position 1136 doesn't match underflow 1344 (hook 1) [ 1897.914828] ip6_tables: last base chain position 928 doesn't match underflow 1136 (hook 1) [ 1954.032735] ip6_tables: last base chain position 720 doesn't match underflow 928 (hook 1) [ 2021.813719] ip6_tables: last base chain position 920 doesn't match underflow 1128 (hook 1) [ 2035.044103] ip6_tables: last base chain position 920 doesn't match underflow 1128 (hook 1) [ 2060.594412] ip6_tables: last base chain position 616 doesn't match underflow 824 (hook 1) I finally got /etc/sysconfig/ip6tables down to this: # Generated by ip6tables-save v1.6.2 on Thu Mar 8 08:20:04 2018 *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [208207395:46346275671] [120166037:34218429901] -A INPUT -i lo+ -j ACCEPT [129329499:129691207309] -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT [0:0] -A INPUT -j DROP COMMIT # Completed on Thu Mar 8 08:20:04 2018 About as minimal as it can get. :) Any ideas?
IPv6 issue in next-20171102 - lockdep and BUG handling RA packet.
I've hit this 6 times now, across 3 boots: Nov 3 11:04:54 turing-police kernel: [ 547.814748] BUG: sleeping function called from invalid context at mm/slab.h:422 Nov 3 20:24:11 turing-police kernel: [ 60.093793] BUG: sleeping function called from invalid context at mm/slab.h:422 Nov 4 20:20:54 turing-police kernel: [86264.366955] BUG: sleeping function called from invalid context at mm/slab.h:422 Nov 5 19:17:40 turing-police kernel: [172469.769179] BUG: sleeping function called from invalid context at mm/slab.h:422 Nov 6 06:07:37 turing-police kernel: [211467.239460] BUG: sleeping function called from invalid context at mm/slab.h:422 Nov 6 14:12:43 turing-police kernel: [ 54.891848] BUG: sleeping function called from invalid context at mm/slab.h:422 Something seems to be going astray while handling a RA packet. Kernel dirty due to hand-patching https://patchwork.kernel.org/patch/10003555/ (signed int:1 bitfield in sched.h causing tons of warnings) Unfortunately, the previous next- kernel I built was -20170927 (which worked OK). Googling for things in the traceback in the last month comes up empty, and only thing in the git log for net/ipv6 that looks vaguely related: commit f3d9832e56c48e4ca50bab0457e21bcaade4536d Author: David AhernDate: Wed Oct 18 09:56:52 2017 -0700 ipv6: addrconf: cleanup locking in ipv6_add_addr Before I go bisecting, this ring any bells? [ 54.750340] [ 54.754060] WARNING: inconsistent lock state [ 54.757758] 4.14.0-rc7-next-20171102-dirty #537 Tainted: GW OE [ 54.761488] [ 54.765143] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 54.768954] swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 54.772821] (fs_reclaim){+.?.}, at: [] fs_reclaim_acquire.part.60+0x5/0x30 [ 54.776762] {SOFTIRQ-ON-W} state was registered at: [ 54.780407] fs_reclaim_acquire.part.60+0x29/0x30 [ 54.784056] __kmalloc+0x71/0x540 [ 54.787666] smp_store_boot_cpu_info+0xfd/0x169 [ 54.791489] native_smp_prepare_cpus+0x155/0x7fc [ 54.795312] kernel_init_freeable+0x1f4/0x614 [ 54.799130] kernel_init+0xb/0x120 [ 54.802927] ret_from_fork+0x27/0x40 [ 54.806716] irq event stamp: 1159488 [ 54.810481] hardirqs last enabled at (1159488): [] __local_bh_enable_ip+0xae/0x150 [ 54.814186] hardirqs last disabled at (1159487): [] __local_bh_enable_ip+0x64/0x150 [ 54.830096] softirqs last enabled at (1159300): [] irq_enter+0x8c/0xd0 [ 54.833949] softirqs last disabled at (1159301): [] irq_exit+0x10b/0x160 [ 54.837745] other info that might help us debug this: [ 54.845446] Possible unsafe locking scenario: [ 54.853164]CPU0 [ 54.856967] [ 54.860759] lock(fs_reclaim); [ 54.864555] [ 54.868315] lock(fs_reclaim); [ 54.872096] *** DEADLOCK *** [ 54.883265] 2 locks held by swapper/0/0: [ 54.887028] #0: (rcu_read_lock){}, at: [] process_backlog+0xac/0x400 [ 54.891014] #1: (rcu_read_lock){}, at: [] ip6_input_finish+0x5/0xb20 [ 54.891030] stack backtrace: [ 54.891040] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW OE 4.14.0-rc7-next-20171102-dirty #537 [ 54.891043] Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A20 05/08/2017 [ 54.891045] Call Trace: [ 54.891051] [ 54.891060] dump_stack+0x7b/0xe4 [ 54.891070] print_usage_bug+0x267/0x320 [ 54.891081] mark_lock+0x5f9/0x7f0 [ 54.891087] ? check_usage_backwards+0x160/0x160 [ 54.891095] ? sched_clock_cpu+0x18/0x1d0 [ 54.891101] ? sched_clock_cpu+0x18/0x1d0 [ 54.89] __lock_acquire+0x628/0x1ca0 [ 54.891121] ? sched_clock_cpu+0x18/0x1d0 [ 54.891126] ? sched_clock_cpu+0x18/0x1d0 [ 54.891135] ? __lock_acquire+0x2e3/0x1ca0 [ 54.891146] lock_acquire+0xb3/0x2f0 [ 54.891153] ? fs_reclaim_acquire.part.60+0x5/0x30 [ 54.891165] fs_reclaim_acquire.part.60+0x29/0x30 [ 54.891170] ? fs_reclaim_acquire.part.60+0x5/0x30 [ 54.891178] kmem_cache_alloc_trace+0x3f/0x500 [ 54.891186] ? cyc2ns_read_end+0x1e/0x30 [ 54.891196] ipv6_add_addr+0x15a/0xc30 [ 54.891217] ? ipv6_create_tempaddr+0x2ea/0x5d0 [ 54.891223] ipv6_create_tempaddr+0x2ea/0x5d0 [ 54.891238] ? manage_tempaddrs+0x195/0x220 [ 54.891249] ? addrconf_prefix_rcv_add_addr+0x1c0/0x4f0 [ 54.891255] addrconf_prefix_rcv_add_addr+0x1c0/0x4f0 [ 54.891268] addrconf_prefix_rcv+0x2e5/0x9b0 [ 54.891279] ? neigh_update+0x446/0xb90 [ 54.891298] ? ndisc_router_discovery+0x5ab/0xf00 [ 54.891303] ndisc_router_discovery+0x5ab/0xf00 [ 54.891311] ? retint_kernel+0x2d/0x2d [ 54.891331] ndisc_rcv+0x1b6/0x270 [ 54.891340] icmpv6_rcv+0x6aa/0x9f0 [ 54.891345] ? ipv6_chk_mcast_addr+0x176/0x530 [ 54.891351] ? do_csum+0x17b/0x260 [ 54.891360] ip6_input_finish+0x194/0xb20 [ 54.891372] ip6_input+0x5b/0x2c0 [ 54.891380] ? ip6_rcv_finish+0x320/0x320 [
Re: [PATCH v2] bpf: silence warnings when building kernel/bpf/core.c with W=1
On Sun, 31 Jul 2016 21:42:22 -0700, Alexei Starovoitov said: > and at least 2 other such patches for other files... > Is there a single warning where -Woverride-init was useful? > May be worth disabling this warning for the whole build? There's a few other cases that *aren't* the "define the array to zero and then build the entries from a list" form. In particular, there's still 3 odd complaints: drivers/ata/ahci.c: drivers/ata/ahci.h:393:16: warning: initialized field overwritten [-Woverride-in it] .can_queue = AHCI_MAX_CMDS - 1, drivers/block/drbd/drbd_main.c: drivers/block/drbd/drbd_main.c:3767:22: warning: initialized field overwritten [ -Woverride-init] [P_RETRY_WRITE] = "retry_write", arch/x86/kernel/cpu/common.c: ./arch/x86/include/asm/page_64_types.h:22:21: warning: initialized field overwri tten [-Woverride-init] #define DEBUG_STKSZ (PAGE_SIZE << DEBUG_STACK_ORDER) The point of these patches is to make -Woverride-init *useful* - you'll never spot 3 warnings in a flood of over 9,000 understood-and-ignored warnings. Get rid of the 9,000 understood-and-ignored warnings, and then things that probably *should* be looked at can be noticed. pgpftgDSBh95x.pgp Description: PGP signature
[PATCH v2] bpf: silence warnings when building kernel/bpf/core.c with W=1
Building with W=1 generates some 350 lines of warnings of the form: kernel/bpf/core.c: In function '__bpf_prog_run': kernel/bpf/core.c:476:33: warning: initialized field overwritten [-Woverride-init] [BPF_ALU | BPF_ADD | BPF_X] = &_ADD_X, ^~ kernel/bpf/core.c:476:33: note: (near initialization for 'jumptable[12]') Since they come from the way we intentionally build the table, silence that one specific warning. Signed-off-by: Valdis Kletnieks <valdis.kletni...@vt.edu> Version 2: Add bpf: subsystem tag to subject line diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index eed911d091da..bb915f9d9f92 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -1,3 +1,4 @@ +CFLAGS_core.o += -Wno-override-init obj-y := core.o obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o
[PATCH] silence warnings when building kernel/bpf/core.c with W=1
Building with W=1 generates some 350 lines of warnings of the form: kernel/bpf/core.c: In function '__bpf_prog_run': kernel/bpf/core.c:476:33: warning: initialized field overwritten [-Woverride-init] [BPF_ALU | BPF_ADD | BPF_X] = &_ADD_X, ^~ kernel/bpf/core.c:476:33: note: (near initialization for 'jumptable[12]') Since they come from the way we intentionally build the table, silence that one specific warning. Signed-off-by: Valdis Kletnieks <valdis.kletni...@vt.edu> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index eed911d091da..bb915f9d9f92 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -1,3 +1,4 @@ +CFLAGS_core.o += -Wno-override-init obj-y := core.o obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o
Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
On Thu, 28 Jul 2016 09:45:12 +0200, Thomas Gleixner said: > On Tue, 26 Jul 2016, nick wrote: > > diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c > > b/drivers/net/ethernet/intel/e1000/e1000_main.c > > index f42129d..e1830af 100644 > > --- a/drivers/net/ethernet/intel/e1000/e1000_main.c > > +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c > > @@ -3797,7 +3797,7 @@ static irqreturn_t e1000_intr(int irq, void *data) > > hw->get_link_status = 1; > > /* guard against interrupt when we're going down */ > > if (!test_bit(__E1000_DOWN, >flags)) > > - schedule_delayed_work(>watchdog_task, 1); > > + mod_work(>watchdog_task, jiffies + 1); > > And that's not even funny anymore. Are you using a random generator to create > these patches? At some point, we need to decide if the occasional accidentally-correct trivial patch from Nick is worth all the wasted maintainer time. pgpmzfn8ooCEA.pgp Description: PGP signature
Re: [PATCH v2 net] ipv6: addrconf: fix Juniper SSL VPN client regression
On Mon, 11 Jul 2016 16:43:50 +0200, Bjørn Mork said: > Link: https://bugzilla.kernel.org/show_bug.cgi?id=121131 > Fixes: cc9da6cc4f56 ("ipv6: addrconf: use stable address generator for ARPHRD _NONE") > Reported-by: Valdis Kletnieks <valdis.kletni...@vt.edu> > Reported-by: Jonas Lippuner <jo...@lippuner.ca> > Suggested-by: Hannes Frederic Sowa <han...@stressinduktion.org> > Cc: åè¤è±æ <hideaki.yoshif...@miraclelinux.com> > Signed-off-by: Bjørn Mork <bj...@mork.no> > --- > v2 changes: > - added a netdevice private flag to suppress automatic IPv6 LL > - suppressing only for "tun" devices Tested against next-20160708, and the Juniper code works fine. Feel free to stick a Tested-By: on the V2 patch... pgpFM6HCMatPo.pgp Description: PGP signature
Re: [PATCH v2 net] ipv6: addrconf: fix Juniper SSL VPN client regression
On Mon, 11 Jul 2016 16:43:50 +0200, Bjørn Mork said: > And finally, Valdis and Jonas: could you please test this version too? It > works for me in my simulated setup, but I don't have the Juniper client > so I cannot verify that it actually solves the problem. The v1 patch worked. I'll be able to test the v2 patch in a few hours pgpV4AP3Vk0Uu.pgp Description: PGP signature
linux-next: UBSAN whine and BUG in net/ipv4/fib_trie.c
Seeing this in next-20160606 (next-20160530 is fine), does it ring any bells before I spend a long evening doing a bisect? The Google doesn't seem to have seen this traceback in the past week [ 226.938222] [ 226.938231] UBSAN: Undefined behaviour in net/ipv4/fib_trie.c:1573:14 [ 226.938235] shift exponent 136 is too large for 64-bit type 'long unsigned int' [ 226.938403] [ 226.938406] UBSAN: Undefined behaviour in net/ipv4/fib_trie.c:1589:22 [ 226.938409] shift exponent 136 is too large for 64-bit type 'long unsigned int' [ 226.938434] Call Trace: [ 226.938437] [] dump_stack+0x7b/0xd1 [ 226.938441] [] ubsan_epilogue+0xd/0x40 [ 226.938445] [] __ubsan_handle_shift_out_of_bounds+0xf9/0x150 [ 226.938449] [] ? cpuacct_account_field+0x251/0x2b0 [ 226.938453] [] ? bh_lru_install+0x244/0x2c0 [ 226.938456] [] leaf_walk_rcu+0x302/0x440 [ 226.938460] [] fib_table_dump+0x6b/0x440 [ 226.938464] [] ? inet_dump_fib+0x74/0x370 [ 226.938468] [] inet_dump_fib+0x142/0x370 [ 226.938471] [] ? inet_dump_fib+0x74/0x370 [ 226.938475] [] rtnl_dump_all+0x12c/0x350 [ 226.938479] [] ? __alloc_skb+0x96/0x2c0 [ 226.938482] [] netlink_dump+0x174/0x3e0 [ 226.938486] [] __netlink_dump_start+0x190/0x240 [ 226.938490] [] rtnetlink_rcv_msg+0x1c0/0x640 [ 226.938493] [] ? trace_hardirqs_on_caller+0x16/0x2c0 [ 226.938497] [] ? fdb_vid_parse+0x90/0x90 [ 226.938500] [] ? fdb_vid_parse+0x90/0x90 [ 226.938504] [] ? rtnl_link_unregister+0x140/0x140 [ 226.938508] [] netlink_rcv_skb+0x87/0xc0 [ 226.938511] [] rtnetlink_rcv+0x2a/0x40 [ 226.938515] [] netlink_unicast+0x200/0x300 [ 226.938518] [] netlink_sendmsg+0x402/0x670 [ 226.938523] [] sock_sendmsg+0x5b/0xd0 [ 226.938526] [] SYSC_sendto+0x153/0x1f0 [ 226.938531] [] ? selinux_socket_setsockopt+0x45/0x60 [ 226.938535] [] ? entry_SYSCALL_64_fastpath+0x5/0xa8 [ 226.938538] [] ? trace_hardirqs_on_caller+0x16/0x2c0 [ 226.938541] [] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 226.938545] [] SyS_sendto+0xe/0x10 [ 226.938549] [] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 226.938553] [] ? trace_hardirqs_off_caller+0x1f/0xf0 followed by a not-surprising BUG while we pagefault because we went off the deep end: [ 226.938555] [ 226.938559] BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1309 [ 226.938563] in_atomic(): 0, irqs_disabled(): 0, pid: 4577, name: geoclue [ 226.938565] INFO: lockdep is turned off. [ 226.938591] Call Trace: [ 226.938595] [] dump_stack+0x7b/0xd1 [ 226.938599] [] ___might_sleep+0x196/0x2f0 [ 226.938603] [] __might_sleep+0x65/0x1f0 [ 226.938607] [] __do_page_fault+0x5b6/0x7d0 [ 226.938611] [] do_page_fault+0xc/0x10 [ 226.938614] [] page_fault+0x22/0x30 [ 226.938619] [] ? leaf_walk_rcu+0x195/0x440 [ 226.938622] [] ? leaf_walk_rcu+0x175/0x440 [ 226.938626] [] fib_table_dump+0x6b/0x440 [ 226.938630] [] ? inet_dump_fib+0x74/0x370 [ 226.938633] [] inet_dump_fib+0x142/0x370 [ 226.938637] [] ? inet_dump_fib+0x74/0x370 [ 226.938641] [] rtnl_dump_all+0x12c/0x350 [ 226.938644] [] ? __alloc_skb+0x96/0x2c0 [ 226.938648] [] netlink_dump+0x174/0x3e0 [ 226.938651] [] __netlink_dump_start+0x190/0x240 [ 226.938655] [] rtnetlink_rcv_msg+0x1c0/0x640 [ 226.938658] [] ? trace_hardirqs_on_caller+0x16/0x2c0 [ 226.938662] [] ? fdb_vid_parse+0x90/0x90 [ 226.938666] [] ? fdb_vid_parse+0x90/0x90 [ 226.938669] [] ? rtnl_link_unregister+0x140/0x140 [ 226.938673] [] netlink_rcv_skb+0x87/0xc0 [ 226.938677] [] rtnetlink_rcv+0x2a/0x40 [ 226.938680] [] netlink_unicast+0x200/0x300 [ 226.938684] [] netlink_sendmsg+0x402/0x670 [ 226.938688] [] sock_sendmsg+0x5b/0xd0 [ 226.938692] [] SYSC_sendto+0x153/0x1f0 [ 226.938696] [] ? selinux_socket_setsockopt+0x45/0x60 [ 226.938700] [] ? entry_SYSCALL_64_fastpath+0x5/0xa8 [ 226.938703] [] ? trace_hardirqs_on_caller+0x16/0x2c0 [ 226.938706] [] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 226.938710] [] SyS_sendto+0xe/0x10 [ 226.938714] [] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 226.938718] [] ? trace_hardirqs_off_caller+0x1f/0xf0 and then the wheels come totally off the bus: [ 226.938728] BUG: unable to handle kernel paging request at 000f6105 [ 226.938733] IP: [] leaf_walk_rcu+0x195/0x440 [ 226.938738] PGD 0 [ 226.938742] Oops: [#1] PREEMPT SMP [ 226.938845] Call Trace: [ 226.938849] [] fib_table_dump+0x6b/0x440 [ 226.938853] [] ? inet_dump_fib+0x74/0x370 [ 226.938857] [] inet_dump_fib+0x142/0x370 [ 226.938860] [] ? inet_dump_fib+0x74/0x370 [ 226.938864] [] rtnl_dump_all+0x12c/0x350 [ 226.938867] [] ? __alloc_skb+0x96/0x2c0 [ 226.938871] [] netlink_dump+0x174/0x3e0 [ 226.938874] [] __netlink_dump_start+0x190/0x240 [ 226.938878] []
Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
On Sun, 24 Apr 2016 14:00:17 -0700, Eric Dumazet said: > On Sun, 2016-04-24 at 15:56 -0400, valdis.kletni...@vt.edu wrote: > > On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said: > > > > > >>> + return !debug_locks || > > > >>> +lockdep_is_held(>sk_lock) || > > > > > Issue here is that once lockdep detected a problem (not necessarily in > > > net/ tree btw), your helper always 'detect' a problem, since lockdep > > > automatically disables itself. > > > > "D'Oh!" -- H. Simpson > > > > I thought this patch looked suspect, but couldn't put my finger on it. The > > reason why I got like 41,000 of them is because I built a kernel that has > > lockdep enabled, but I have an out-of-tree module that doesn't init > > something, > > so I get this: > > > > [ 48.898156] INFO: trying to register non-static key. > > [ 48.898157] the code is fine but needs lockdep annotation. > > [ 48.898157] turning off the locking correctness validator. > > > > After which point, even with this patch, every time through it's still > > going to > > explode. > > Which patch are you talking about ? The one that adds the !debug_locks check - once my out-of-kernel module hits something that turns off lockdep, it's *still* going to complain on pretty much all the same packets it complained about earlier. I thought it looked suspicious, but you clarified why... pgpMNDGLbxiui.pgp Description: PGP signature
Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said: > >>> + return !debug_locks || > >>> +lockdep_is_held(>sk_lock) || > Issue here is that once lockdep detected a problem (not necessarily in > net/ tree btw), your helper always 'detect' a problem, since lockdep > automatically disables itself. "D'Oh!" -- H. Simpson I thought this patch looked suspect, but couldn't put my finger on it. The reason why I got like 41,000 of them is because I built a kernel that has lockdep enabled, but I have an out-of-tree module that doesn't init something, so I get this: [ 48.898156] INFO: trying to register non-static key. [ 48.898157] the code is fine but needs lockdep annotation. [ 48.898157] turning off the locking correctness validator. After which point, even with this patch, every time through it's still going to explode. pgpPugvHosjHN.pgp Description: PGP signature
Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said: > Hi, > > On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote: > > linux-next 20160420 is whining at an incredible rate - in 20 minutes of > > uptime, I piled up some 41,000 hits from all over the place (cleaned up > > to skip the CPU and PID so the list isn't quite so long): > > Thanks for the report. Can you give me some more details: > > Is this an nfs socket? Do you by accident know if this socket went > through xs_reclassify_socket at any point? We do hold the appropriate > locks at that point but I fear that the lockdep reinitialization > confused lockdep. It wasn't an NFS socket, as NFS wasn't even active at the time. I'm reasonably sure that multiple sockets were in play, given that tcp_v6_rcv and udpv6_queue_rcv_skb were both implicated. I strongly suspect that pretty much any IPv6 traffic could do it - the frequency dropped off quite a bit when I closed firefox, which is usually a heavy network hitter on my laptop. pgp33P5xQXd_u.pgp Description: PGP signature
IPv6 patch mysteriously breaks IPv4 VPN
I'll say up front - no, I do *not* have a clue why this commit causes this problem - it makes exactly zero fsking sense. Scenario: $WORK is blessed with a Juniper VPN system. I've been seeing for a while now (since Dec-ish) an issue where at startup, the tun0 device will get wedged. ifconfig reports this: tun0: flags=4305mtu 1400 inet 172.27.1.165 netmask 255.255.255.255 destination 172.27.1.165 inet6 fe80::6802:d95c:f3f4:2a6f prefixlen 64 scopeid 0x20 unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 500 (UNSPEC) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1 bytes 48 (48.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 and no more packets cross - not even a ping. Yes, the tunnel is ipv4 only, and only ipv4 routes get set by the VPN software. bisect results confirmed - linux-next 20160327 is bad, but 20160420 with this one conmmit reverted works. % git bisect bad cc9da6cc4f56e05cc9e591459fe0192727ff58b3 is the first bad commit commit cc9da6cc4f56e05cc9e591459fe0192727ff58b3 Author: Bjørn Mork Date: Wed Dec 16 16:44:38 2015 +0100 ipv6: addrconf: use stable address generator for ARPHRD_NONE Add a new address generator mode, using the stable address generator with an automatically generated secret. This is intended as a default address generator mode for device types with no EUI64 implementation. The new generator is used for ARPHRD_NONE interfaces initially, adding default IPv6 autoconf support to e.g. tun interfaces. If the addrgenmode is set to 'random', either by default or manually, and no stable secret is available, then a random secret is used as input for the stable-privacy address generator. The secret can be read and modified like manually configured secrets, using the proc interface. Modifying the secret will change the addrgen mode to 'stable-privacy' to indicate that it operates on a known secret. Existing behaviour of the 'stable-privacy' mode is kept unchanged. If a known secret is available when the device is created, then the mode will default to 'stable-privacy' as before. The mode can be manually set to 'random' but it will behave exactly like 'stable-privacy' in this case. The secret will not change. Cc: Hannes Frederic Sowa Cc: åè¤è±æ Signed-off-by: Bjørn Mork Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller (Sorry for the delay in reporting this - bisecting this proved to be a bear and a half, because this problematic commit landed only about 10 commits after this one: git bisect start # good: [1bd4978a88ac2589f3105f599b1d404a312fb7f6] tun: honor IFF_UP in tun_get_user() which fixed a *different* issue that prevented the tun device from getting created at all (or it was immediately taken back down by the VPN software). End result was that unless I gave a "known good" start point in that dozen commit range, there's be a month's worth of 'git commit skip' to wade through. I got damned lucky and found a record on one of my servers of an ssh over VPN, and correlated it to the one day that linux-next had the above fix for the previous issue, and wasn't broken by this current issue) pgp7t7dQRSLiQ.pgp Description: PGP signature
linux-next: zillions of lockdep whinges in include/net/sock.h:1408
linux-next 20160420 is whining at an incredible rate - in 20 minutes of uptime, I piled up some 41,000 hits from all over the place (cleaned up to skip the CPU and PID so the list isn't quite so long): % grep include/net/sock.h /var/log/messages | cut -f5- -d: | sed -e 's/PID: [0-9]* /PID: (elided) /' -e 's/CPU: [0-3]/CPU: +/' | sort | uniq -c | sort -nr 13468 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_v6_rcv+0xc20/0xcb0 9770 CPU: + PID: (elided) at include/net/sock.h:1408 udp_queue_rcv_skb+0x3ca/0x6d0 7706 CPU: + PID: (elided) at include/net/sock.h:1408 sock_owned_by_user+0x91/0xa0 2818 CPU: + PID: (elided) at include/net/sock.h:1408 udpv6_queue_rcv_skb+0x3b6/0x6d0 1981 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_write_timer+0xf2/0x110 1954 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_delack_timer+0x110/0x130 1912 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_keepalive_timer+0x136/0x2c0 882 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_close+0x226/0x4f0 804 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_tasklet_func+0x192/0x1e0 28 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_child_process+0x17a/0x350 2 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_v6_err+0x401/0x660 2 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_v6_err+0x1fd/0x660 Seems to be from this commit, which is apparently over-stringent or isn't handling some case correctly: commit fafc4e1ea1a4c1eb13a30c9426fb799f5efacbc3 Author: Hannes Frederic SowaDate: Fri Apr 8 15:11:27 2016 +0200 sock: tigthen lockdep checks for sock_owned_by_user sock_owned_by_user should not be used without socket lock held. It seems to be a common practice to check .owned before lock reclassification, so provide a little help to abstract this check away. Cc: linux-c...@vger.kernel.org Cc: linux-blueto...@vger.kernel.org Cc: linux-...@vger.kernel.org Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller pgpzfUwGbQWVc.pgp Description: PGP signature
Re: next-20151207 - crash in IPv6 code
On Tue, 08 Dec 2015 12:34:09 +0100, Florian Westphal said: > Valdis Kletnieks <valdis.kletni...@vt.edu> wrote: > > [ CC Pablo ] > > > Seen this in 2 boots out of two on next-20151207 when IPV6 networking > > was available. It was stable when no net was available. Also, > > next-20161127 is OK. > > Haven't bisected it yet - this ring any bells? > > Thanks for the report, my fault -- its caused by > 029f7f3b8701cc7aca8bdb which is only in Pablos nf-next tree. > > This should fix this bug (proper patch w. changelog coming > after more testing): > > diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c > b/net/ipv6/netfilter/nf_conntrack_reasm.c Pumped about 100M of IPv6 traffic through, and no problems. Feel free to stick a Reported-by:/Tested-By: on this patch... pgpSIkvHF9zt9.pgp Description: PGP signature
next-20151207 - crash in IPv6 code
Seen this in 2 boots out of two on next-20151207 when IPV6 networking was available. It was stable when no net was available. Also, next-20161127 is OK. Haven't bisected it yet - this ring any bells? [ 92.231022] BUG: unable to handle kernel NULL pointer dereference at (null) [ 92.231035] IP: [] nf_ct_frag6_gather+0x81b/0xba0 [ 92.231046] PGD 0 [ 92.231050] Oops: [#1] PREEMPT SMP [ 92.231166] Call Trace: [ 92.231170] [ 92.231196] [] ipv6_defrag+0x66/0x80 [ 92.231206] [] nf_iterate+0x62/0x80 [ 92.231216] [] nf_hook_slow+0xba/0x1b0 [ 92.231225] [] ? nf_hook_slow+0x5/0x1b0 [ 92.231235] [] ipv6_rcv+0x83d/0x8d0 [ 92.231242] [] ? ipv6_rcv+0x3e/0x8d0 [ 92.231251] [] ? ip6_input_finish+0x7e0/0x7e0 [ 92.231260] [] __netif_receive_skb_core+0x60a/0xd70 [ 92.231269] [] __netif_receive_skb+0x20/0x90 [ 92.231278] [] netif_receive_skb_internal+0x70/0x1f0 [ 92.231285] [] ? netif_receive_skb_internal+0x25/0x1f0 [ 92.231292] [] ? eth_type_trans+0x11b/0x200 [ 92.231300] [] netif_receive_skb+0x59/0x170 [ 92.231308] [] ieee80211_deliver_skb+0x120/0x180 [ 92.231315] [] ieee80211_rx_handlers+0x2762/0x29f0 [ 92.231324] [] ? skb_queue_tail+0x20/0x50 [ 92.231335] [] ? do_raw_spin_lock+0x148/0x1e0 [ 92.231342] [] ? trace_hardirqs_on_caller+0x16/0x1b0 [ 92.231358] [] ieee80211_prepare_and_rx_handle+0x24e/0xa80 [ 92.231365] [] ? ieee80211_rx_napi+0x23a/0xf00 [ 92.231373] [] ieee80211_rx_napi+0x537/0xf00 [ 92.231380] [] ? ieee80211_rx_napi+0x23a/0xf00 [ 92.231391] [] ieee80211_tasklet_handler+0xc5/0xd0 [ 92.231401] [] tasklet_action+0x1d5/0x220 [ 92.231409] [] __do_softirq+0xec/0x5a0 [ 92.231417] [] irq_exit+0xd4/0xe0 [ 92.231426] [] do_IRQ+0x6a/0x120 [ 92.231434] [] common_interrupt+0x89/0x89 [ 92.231440] [ 92.231450] [] ? cpuidle_enter_state+0x1ac/0x410 [ 92.231458] [] ? trace_hardirqs_on+0xd/0x10 [ 92.231466] [] ? cpuidle_enter_state+0x1b7/0x410 [ 92.231476] [] ? cpuidle_enter_state+0x1ac/0x410 [ 92.231485] [] cpuidle_enter+0x17/0x20 [ 92.231494] [] cpu_startup_entry+0x48d/0x520 [ 92.231503] [] start_secondary+0x154/0x170 [ 92.231510] Code: 8b fd ff ff 48 8b 13 48 89 10 49 8b 0e 49 39 ce 0f 84 80 01 00 00 48 8b 11 48 39 d3 0f 84 71 01 00 00 49 39 d6 0f 84 6b 01 00 00 <48 > 8b 0a 48 39 cb 0f 84 59 01 00 00 48 89 ca 49 39 d6 75 ec e9 [ 92.231685] RIP [] nf_ct_frag6_gather+0x81b/0xba0 [ 92.231698] RSP [ 92.231704] CR2: [ 92.231714] ---[ end trace 62089aaf8d90e56a ]--- [ 94.678192] Kernel panic - not syncing: Fatal exception in interrupt [ 94.678228] Kernel Offset: 0x3300 from 0x8100 (relocation range: 0x8000-0xbfff) pgpB40SsK8CVZ.pgp Description: PGP signature
e1000e driver - commit 37b12910 needs to be in Linus tree..
Hate to bother everybody, but we're at -rc7.. Commit 83129b37ef is still in Linus's tree. This one causes a crash on my Dell Latitude after 4 hours uptime. commit 83129b37ef35bb6a7f01c060129736a8db5d31c4 Author: Yanir Lubetkin yanirx.lubet...@intel.com Date: Tue Jun 2 17:05:45 2015 +0300 e1000e: fix systim issues Commit 37b12910dd fixes it, but is *NOT* in Linus's tree yet. commit 37b12910dd11d9ab969f2c310dc9160b7f3e3405 Author: Raanan Avargil raanan.avar...@intel.com Date: Sun Jul 19 16:33:20 2015 +0300 e1000e: Fix tight loop implementation of systime read algorithm Can we do something to get this fix in-tree, or revert the problem commit? pgp_mGNlYBR7O.pgp Description: PGP signature
next-20150714 - busted IPv6 source address selection...
next-20150714 w/ one ethernet commit reverted. -0706 w/ same revert works. ip addr show for the wireless says: 5: wlp3s0b1: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether bc:85:56:1f:4f:6d brd ff:ff:ff:ff:ff:ff inet 172.30.42.75/27 brd 172.30.42.95 scope global dynamic wlp3s0b1 valid_lft 80952sec preferred_lft 80952sec inet6 2601:5c0:c100:9243:449b:a30:2f54:5b/64 scope global temporary dynamic valid_lft 7181sec preferred_lft 1781sec inet6 2601:5c0:c100:9243:be85:56ff:fe1f:4f6d/64 scope global mngtmpaddr dynamic valid_lft 7181sec preferred_lft 1781sec inet6 fe80::be85:56ff:fe1f:4f6d/64 scope link valid_lft forever preferred_lft forever but outbound packets have a bogus source address: 23:01:52.938986 IP6 a835::100:0:200:0.41360 2001:500:19::1.domain: 41835% [1au] ? c0.info.afilias-nst.info. (53) 23:05:54.131991 IP6 a835::100:0:200:0 2001:468:c80:2105:211:43ff:feda:d769: ICMP6, echo request, seq 56, length 64 Interestingly enough, link-local addresses manage to get it wrong in another fashion: 23:02:52.806011 IP6 fe80::120d:7fff:fe64:ca0b.60625 ff02::c.ssdp: UDP, length 411 I suspected this commit mostly because it's the only one that seems to touch source selection recebntly... commit 9131f3de24db4dc12199aede7d931e6703e97f3b Author: YOSHIFUJI Hideaki/åè¤è±æ hideaki.yoshif...@miraclelinux.com Date: Fri Jul 10 16:58:31 2015 +0900 ipv6: Do not iterate over all interfaces when finding source address on specific interface. Reverting this commit make things work again. pgppdznDAoqfe.pgp Description: PGP signature
Re: [PATCH v2] add stealth mode
On Thu, 02 Jul 2015 10:56:01 +0200, Matteo Croce said: Add option to disable any reply not related to a listening socket, like RST/ACK for TCP and ICMP Port-Unreachable for UDP. Also disables ICMP replies to echo request and timestamp. The stealth mode can be enabled selectively for a single interface. A few notes. 1) Do you have an actual use case where an iptables '-j DROP' isn't usable? 2) You *do* realize that this isn't anywhere near sufficient in order to actually make your machine invisible, right? (Hint: What *other* packets can be sent to a machine to provoke a response?) 3) At least my copy had massive whitespace damage, where all the tab characters appear to have evaporated pgpbjzUxX6FGO.pgp Description: PGP signature
e1000e driver - hang after 4 hours of uptime - finally bisected!
(follow up to a report from last week - bisecting took a while as I could only do 1 or 2 tests an evening) My Dell Latitude E6530 crashes with a specific kernel lockup almost exactly 4 hours after boot if there isn't a cable connected to the Ethernet port: [14508.846327] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 [14468.229720] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 [14463.254791] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 [14491.134413] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1 [14463.396593] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2 [14490.390223] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1 [14494.680591] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 [14513.365378] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1 [14482.271716] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3 [14479.906820] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 As far as I can tell, the timestamp jitter is just how long it takes me to enter the cryptLUKS passphrase for the hard drive at boot... lspci tells me: lspci -vvv -s 00:19.0 00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04) DeviceName: Onboard LAN Subsystem: Dell Device 0535 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 28 Region 0: Memory at f770 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at f7739000 (32-bit, non-prefetchable) [size=4K] Region 2: I/O ports at f040 [size=32] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: fee00318 Data: Capabilities: [e0] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR- AFStatus: TP- Kernel driver in use: e1000e The traceback always looks like: [14479.906820] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 [14479.906908] Call Trace: [14479.906914] NMI [ba94db16] dump_stack+0x50/0xa8 [14479.906930] [ba948bb9] panic+0xcd/0x1e4 [14479.906940] [ba166a60] ? perf_event_task_disable+0xc0/0xc0 [14479.906952] [ba125d8b] watchdog_overflow_callback+0x9b/0xa0 [14479.906959] [ba16a684] __perf_event_overflow+0xc4/0x1f0 [14479.906968] [ba16b3a4] perf_event_overflow+0x14/0x20 [14479.906976] [ba022271] intel_pmu_handle_irq+0x1e1/0x430 [14479.906990] [ba01a0f6] perf_event_nmi_handler+0x26/0x40 [14479.906999] [ba0085b3] nmi_handle+0x103/0x340 [14479.907005] [ba0084b5] ? nmi_handle+0x5/0x340 [14479.907017] [ba008a53] default_do_nmi+0xc3/0x120 [14479.907032] [ba008b98] do_nmi+0xe8/0x130 [14479.907044] [ba95c9a8] end_repeat_nmi+0x1e/0x2e [14479.907055] [ba529886] ? e1000e_cyclecounter_read+0x16/0xc0 [14479.907061] [ba529886] ? e1000e_cyclecounter_read+0x16/0xc0 [14479.907069] [ba529886] ? e1000e_cyclecounter_read+0x16/0xc0 [14479.907075] EOE [ba0e9529] timecounter_read+0x19/0x60 [14479.907088] [ba53687e] e1000e_phc_gettime+0x2e/0x60 [14479.907098] [ba536a31] e1000e_systim_overflow_work+0x31/0x70 [14479.907105] [ba07ad19] process_one_work+0x3c9/0x980 [14479.907115] [ba07ac62] ? process_one_work+0x312/0x980 [14479.907125] [ba07b348] ? worker_thread+0x78/0x760 [14479.907134] [ba07b59c] worker_thread+0x2cc/0x760 [14479.907144] [ba07b2d0] ? process_one_work+0x980/0x980 [14479.907154] [ba082a5e] kthread+0xfe/0x120 [14479.907163] [ba08ca50] ? finish_task_switch+0x50/0x1c0 [14479.907173] [ba082960] ? kthread_create_on_node+0x270/0x270 [14479.907179] [ba95ae4f] ret_from_fork+0x3f/0x70 [14479.907188] [ba082960] ? kthread_create_on_node+0x270/0x270 [14479.907243] Kernel Offset: 0x3900 from 0x8100 (relocation range: 0x8000-0xbfff) Bisection tells me it's this commit: commit 83129b37ef35bb6a7f01c060129736a8db5d31c4 Author: Yanir Lubetkin yanirx.lubet...@intel.com Date: Tue Jun 2 17:05:45 2015 +0300 e1000e: fix systim issues Two issues involving systim were reported. 1. Clock is not running in the correct frequency 2. In some situations, systim values were not incremented linearly This patch fixes the hardware clock configuration and the
Re: next-20150610 - repeated hangs at e1000e_phc_gettime+0x2e/0x60
On Thu, 11 Jun 2015 22:57:48 -0400, Valdis Kletnieks said: 0) next-20150603 works, so the problem landed in linux-next in the last week. 1) All 3 times happened while I was at home, using wireless, so the interface didn't have link and was ifconfig'ed down. All 3 crashes happened at almost exactly 4 hours of uptime, but here in my office I'm now at 6 hours on the same kernel while running with the interface plugging in and doing traffic. I have a fighting chance of mostly finishing a bisect over the weekend, I'll let you know where that leads. pgpVQUlUm7ZLN.pgp Description: PGP signature
next-20150610 - repeated hangs at e1000e_phc_gettime+0x2e/0x60
I'm seeing repeated hard lockups on my Dell Latitude E6530. Helpful info: 0) next-20150603 works, so the problem landed in linux-next in the last week. 1) All 3 times happened while I was at home, using wireless, so the interface didn't have link and was ifconfig'ed down. 2) Remarkably similar times for it to blow up: [14513.365378] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1 [14482.271716] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3 [14479.906820] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 (I suspect the offsets were caused by differences in how long it took me to correctly enter the cryptLUKS passphrase for my encrypted root filesystem) Oddly enough, I don't see any patches to the e1000e driver in quite some time... but that's where it keeps locking up. This ringing any bells? All 3 traces look like: [14479.906908] Call Trace: [14479.906914] NMI [ba94db16] dump_stack+0x50/0xa8 [14479.906930] [ba948bb9] panic+0xcd/0x1e4 [14479.906940] [ba166a60] ? perf_event_task_disable+0xc0/0xc0 [14479.906952] [ba125d8b] watchdog_overflow_callback+0x9b/0xa0 [14479.906959] [ba16a684] __perf_event_overflow+0xc4/0x1f0 [14479.906968] [ba16b3a4] perf_event_overflow+0x14/0x20 [14479.906976] [ba022271] intel_pmu_handle_irq+0x1e1/0x430 [14479.906990] [ba01a0f6] perf_event_nmi_handler+0x26/0x40 [14479.906999] [ba0085b3] nmi_handle+0x103/0x340 [14479.907005] [ba0084b5] ? nmi_handle+0x5/0x340 [14479.907017] [ba008a53] default_do_nmi+0xc3/0x120 [14479.907032] [ba008b98] do_nmi+0xe8/0x130 [14479.907044] [ba95c9a8] end_repeat_nmi+0x1e/0x2e [14479.907055] [ba529886] ? e1000e_cyclecounter_read+0x16/0xc0 [14479.907061] [ba529886] ? e1000e_cyclecounter_read+0x16/0xc0 [14479.907069] [ba529886] ? e1000e_cyclecounter_read+0x16/0xc0 [14479.907075] EOE [ba0e9529] timecounter_read+0x19/0x60 [14479.907088] [ba53687e] e1000e_phc_gettime+0x2e/0x60 [14479.907098] [ba536a31] e1000e_systim_overflow_work+0x31/0x70 [14479.907105] [ba07ad19] process_one_work+0x3c9/0x980 [14479.907115] [ba07ac62] ? process_one_work+0x312/0x980 [14479.907125] [ba07b348] ? worker_thread+0x78/0x760 [14479.907134] [ba07b59c] worker_thread+0x2cc/0x760 [14479.907144] [ba07b2d0] ? process_one_work+0x980/0x980 [14479.907154] [ba082a5e] kthread+0xfe/0x120 [14479.907163] [ba08ca50] ? finish_task_switch+0x50/0x1c0 [14479.907173] [ba082960] ? kthread_create_on_node+0x270/0x270 [14479.907179] [ba95ae4f] ret_from_fork+0x3f/0x70 [14479.907188] [ba082960] ? kthread_create_on_node+0x270/0x270 [14479.907243] Kernel Offset: 0x3900 from 0x8100 (relocation range: pgpaLGGXQq3pB.pgp Description: PGP signature
Re: tbench regression in 2.6.25-rc1
On Mon, 18 Feb 2008 16:12:38 +0800, Zhang, Yanmin said: I also think __refcnt is the key. I did a new testing by adding 2 unsigned long pading before lastuse, so the 3 members are moved to next cache line. The performance is recovered. How about below patch? Almost all performance is recovered with the new patch. Signed-off-by: Zhang Yanmin [EMAIL PROTECTED] Could you add a comment someplace that says refcnt wants to be on a different cache line from input/output/ops or performance tanks badly, to warn some future kernel hacker who starts adding new fields to the structure? pgpVvmy7EVPXS.pgp Description: PGP signature
Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...
On Sun, 13 Jan 2008 02:35:33 EST, [EMAIL PROTECTED] said: I'm seeing problems with Sendmail on 24-rc6-mm1, where the main Sendmail is listening on ::1/25, and Fetchmail connects to 127.0.0.1:25 to inject mail it has just fetched from an outside server via IMAP - it will often just hang and not make any further progress. Looking at netstat shows something interesting: % netstat -n -a -A inet | grep 25 tcp0 5108 127.0.0.1:59355 127.0.0.1:25 ESTABLISHED The IPv6 is apparently a red herring - this morning I'm seeing the same problem with another totally separate pair of programs that are IPv4-only, hanging on loopback. pgpCt449rgEtt.pgp Description: PGP signature
Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...
On Mon, 14 Jan 2008 11:36:40 EST, Paul Moore said: Are you still only seeing these problems on loopback? I can't help but wonder if this is the skb_clone() problem where it wasn't copying skb-iif causing SELinux to silently drop the packets. Yes, I've only spotted it on loopback. The odd part is that I had reverted the one commit 9c6ad8f6895db7a517c04c2147cb5e7ffb83a315 Convert the netif code to use ifindex values - so either I managed to get the revert terribly wrong, or there's something else odd going on. The first time around, I was seeing hangs during a TCP 3-packet handshake - this time data flows for some number of packets before hanging. I'm pulling git://git.infradead.org/users/pcmoore/lblnet-2.6_testing at the moment, and seeing if there's already a fix in there for this. pgpK6KMHMPqiW.pgp Description: PGP signature
Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...
On Mon, 14 Jan 2008 13:05:48 EST, [EMAIL PROTECTED] said: I'm pulling git://git.infradead.org/users/pcmoore/lblnet-2.6_testing at the moment, and seeing if there's already a fix in there for this. Apparently the only new commit in there since the tree that was in 24-rc6-mm1 is 5d95575903fd3865b884952bd93c339d48725c33 adding some warning printk's. Would it be more productive to test against the full tree, or leaving out the one commit I already reverted? pgptYAqhsxkEy.pgp Description: PGP signature
Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...
On Mon, 14 Jan 2008 13:22:10 EST, [EMAIL PROTECTED] said: Apparently the only new commit in there since the tree that was in 24-rc6-mm1 is 5d95575903fd3865b884952bd93c339d48725c33 adding some warning printk's. Would it be more productive to test against the full tree, or leaving out the one commit I already reverted? voice=Emily Litella Nevermind... /voice :) The new commit won't apply with the other one reverted - it patches security/selinux/netnode.c which was created by the problematic commit... pgpyVRuWWhuHY.pgp Description: PGP signature
Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...
On Mon, 14 Jan 2008 14:07:46 EST, Paul Moore said: There have been quite a few changes in lblnet-2.6_testing since 2.6.24-rc6-mm1 so I would recommend taking the whole tree. I'm also not quite sure if Weird. I did a 'git clone git://git.infradead.org/users/pcmoore/lblnet-2.6_testing' into a new directory this morning, and doing a 'git log' against that only showed the one added commit: commit 5d95575903fd3865b884952bd93c339d48725c33 Author: Paul Moore [EMAIL PROTECTED] Date: Wed Jan 9 15:30:23 2008 -0500 SELinux: Add warning messages on network denial due to error Currently network traffic can be sliently dropped due to non-avc errors which can lead to much confusion when trying to debug the problem. This patch adds warning messages so that when these events occur there is a user visible notification. Signed-off-by: Paul Moore [EMAIL PROTECTED] commit 9259ca5fd8b9fbdd2c3edade593dead905d8391e Author: Paul Moore [EMAIL PROTECTED] Date: Wed Jan 9 15:30:23 2008 -0500 SELinux: Add network ingress and egress control permission checks (already in 24-rc6-mm1). Somebody please tell me it's my git-idiocy.. simply reverting the Convert the netif code to use ifindex values patch would solve the problem as there are other patches in the rc6-mm1 tree that rely on skb-iif being valid (new code, not converted code). That would explain why I'm still seeing issues.. If you want to stick with a _relatively_ vanilla rc6-mm1 tree I would leave everything in and simply apply the following patch which solved the skb_clone()/iif problem: http://git.infradead.org/?p=users/pcmoore/lblnet-2.6_testing;a=commitdiff;h=02f1c89d6e36507476f78108a3dcc78538be460b OK, I'll go look at that.. pgptELUzSUM99.pgp Description: PGP signature
Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...
On Mon, 14 Jan 2008 14:07:46 EST, Paul Moore said: http://git.infradead.org/?p=users/pcmoore/lblnet-2.6_testing;a=commitdiff;h=02f1c89d6e36507476f78108a3dcc78538be460b Initial testing indicates that 2.6.24-rc6-mm1 plus this one commit is behaving itself correctly - my Tcl test case that reliably demonstrated wedges during SYN handling is definitively fixed, and the current issue with hangs with data pending seems to be gone as well (after admittedly light testing). Thanks for finding the commit that fixed it... pgpZcOH0x4Nnq.pgp Description: PGP signature
2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...
I'm seeing problems with Sendmail on 24-rc6-mm1, where the main Sendmail is listening on ::1/25, and Fetchmail connects to 127.0.0.1:25 to inject mail it has just fetched from an outside server via IMAP - it will often just hang and not make any further progress. Looking at netstat shows something interesting: % netstat -n -a -A inet | grep 25 tcp0 5108 127.0.0.1:59355 127.0.0.1:25 ESTABLISHED % netstat -n -a -A inet6 | grep 25 tcp0 0 :::25 :::* LISTEN tcp0 0 :::127.0.0.1:25 :::127.0.0.1:59355 ESTABLISHED % netstat -n -a -A inet | grep 25 tcp0 5108 127.0.0.1:59355 127.0.0.1:25 ESTABLISHED % netstat -n -a -A inet6 | grep 25 tcp0 0 :::25 :::* LISTEN tcp0 0 :::127.0.0.1:25 :::127.0.0.1:59355 ESTABLISHED % netstat -n -a -A inet | grep 25 tcp0 5108 127.0.0.1:59355 127.0.0.1:25 ESTABLISHED % netstat -n -a -A inet6 | grep 25 tcp0 0 :::25 :::* LISTEN tcp0 0 :::127.0.0.1:25 :::127.0.0.1:59355 ESTABLISHED On the IPv4 side, it thinks it's got 5108 bytes in the send queue - but on the IPv6 side of that same connection, it's showing 0 in the receive queue, and we're stuck there. It's not consistent - sometimes Fetchmail will wedge on the very first mail, and do so several times in a row. Other times, it will do well for a while - at the moment, it's gone through 471 of the 1,470 currently queued mails just fine, only to get wedged again on number 472. For what it's worth, here's what 'echo w /proc/sysrq-trigger' got, although I don't see anything that looks odd to me given the netstat output above - procmail has sent data, and is waiting for a response back, and sendmail is waiting for data to arrive: fetchmail S 8053c520 5360 17612 9902 81007d37bb08 0086 000200d0 81006bf826c0 80687360 81006bf82918 0001 0003 81007d37bb88 Call Trace: [80522682] schedule_timeout+0x22/0xb4 [80523bd3] _spin_lock_bh+0x11/0x38 [80523ac4] _spin_unlock_bh+0x1e/0x20 [8047cec6] release_sock+0xa3/0xac [8047d98f] sk_wait_data+0x8a/0xcf [80249b99] autoremove_wake_function+0x0/0x38 [804abdca] tcp_recvmsg+0x35a/0x86b [8047c7be] sock_common_recvmsg+0x32/0x47 [803288be] selinux_socket_recvmsg+0x1d/0x1f [8047af38] sock_recvmsg+0x10e/0x12f [80249b99] autoremove_wake_function+0x0/0x38 [8032425d] avc_has_perm+0x4c/0x5e [803ac952] pty_write+0x3a/0x44 [80249dd8] remove_wait_queue+0x2f/0x3b [8047c06b] sys_recvfrom+0xa4/0xf5 [8024c850] hrtimer_start+0x11f/0x131 [8023aa6e] do_setitimer+0x184/0x326 [8020c03b] system_call_after_swapgs+0x7b/0x80 sendmail S 81007d30a400 5360 17613 16992 81006bc419e8 0086 81006bc41998 8023f6a5 81007d30a400 81007d24f200 81007d30a658 00010286 81006bc419e8 8023f851 4789b768 81000100eb20 Call Trace: [8023f6a5] lock_timer_base+0x26/0x4a [8023f851] __mod_timer+0xc4/0xd6 [805226ed] schedule_timeout+0x8d/0xb4 [8023f37c] process_timeout+0x0/0xb [805226e8] schedule_timeout+0x88/0xb4 [8029cd26] do_select+0x4a9/0x50b [8029d22d] __pollwait+0x0/0xdf [8022d7b9] default_wake_function+0x0/0xf [80523bd3] _spin_lock_bh+0x11/0x38 [8047cf74] lock_sock_nested+0xa5/0xb2 [80523bd3] _spin_lock_bh+0x11/0x38 [80523ac4] _spin_unlock_bh+0x1e/0x20 [8047cec6] release_sock+0xa3/0xac [804ac1c9] tcp_recvmsg+0x759/0x86b [8047c7be] sock_common_recvmsg+0x32/0x47 [803288be] selinux_socket_recvmsg+0x1d/0x1f [8047a924] sock_aio_read+0x121/0x139 [8032425d] avc_has_perm+0x4c/0x5e [8029cf7a] core_sys_select+0x1f2/0x2a0 [80282f50] page_add_new_anon_rmap+0x20/0x22 [803251f5] file_has_perm+0xa5/0xb4 [80249b99] autoremove_wake_function+0x0/0x38 [8029d45c] sys_select+0x150/0x17b [8020c03b] system_call_after_swapgs+0x7b/0x80 Any ideas? pgpvgxyKH3EzJ.pgp Description: PGP signature
2.6.24-rc5-mm1 - IPv6 throws section mismatches.
On Thu, 13 Dec 2007 02:40:50 PST, Andrew Morton said: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc5/2.6.24-rc5-mm1/ git-net.patch (I'm guessing one of Daniel's commits, but not sure which one) causes some complaints: LD vmlinux.o MODPOST vmlinux.o WARNING: vmlinux.o(.init.text+0x2263f): Section mismatch: reference to .exit.text:tcpv6_exit (between 'inet6_init' and 'ac6_proc_init') WARNING: vmlinux.o(.init.text+0x22644): Section mismatch: reference to .exit.text:udplitev6_exit (between 'inet6_init' and 'ac6_proc_init') WARNING: vmlinux.o(.init.text+0x22649): Section mismatch: reference to .exit.text:udpv6_exit (between 'inet6_init' and 'ac6_proc_init') WARNING: vmlinux.o(.init.text+0x22658): Section mismatch: reference to .exit.text:addrconf_cleanup (between 'inet6_init' and 'ac6_proc_init') WARNING: vmlinux.o(.init.text+0x226bc): Section mismatch: reference to .exit.text:rawv6_exit (between 'inet6_init' and 'ac6_proc_init') Looks like the problem is that tcpv6_exit and friends are called from net/ipv6/af_inet6.c:inet6_init() - which is declared as: static int __init inet6_init(void) I can see how calling an __exit from an __init would be Bad Juju... pgpPCyT64YIuZ.pgp Description: PGP signature
Re: namespace support requires network modules to say GPL
On Sun, 02 Dec 2007 13:51:04 GMT, Alan Cox said: On Sat, 1 Dec 2007 16:30:35 -0800 I spoke too soon earlier, ndiswrapper builds and loads against current 2.6.24-rc3. Vmware and proprietary VPN software probably do not. Once again I don't give a damn, but the enterprise distro vendors certainly care. Enterprise distro vendors ship kernels from the 2.6.19 era, so I don't see why they care. They don't care *now*. They will care when they try to rev forward from .19. Not that they'll care a *lot* - it took *me* all of about an hour to get VMware Server 1.0.4 working under -rc3-mm2. Probably will take an enterprise distro 4-5 hours, 30 mins for the port and 4 1/2 hours for the paperwork. :) pgpZ81OzGlrGA.pgp Description: PGP signature
2.6.23-mm1 tg3 wake-on-lan oddity...
Scenario - Dell Latitude D820 laptop, tg3 driver says this at boot: eth0: Tigon3 [partno(BCM5752KFBG) rev 6002 PHY(5752)] (PCI Express) 10/100/1000Base-T Ethernet 00:15:c5:c8:33:4e eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[1] TSOcap[1] eth0: dma_rwctrl[7618] dma_mask[64-bit] # (lspci; lspci -n) | grep 09 09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5752 Gigabit Ethernet PCI Express (rev 02) 09:00.0 0200: 14e4:1600 (rev 02) (I think that's most of the likely-relevant info...) Issue: I (for unrelated reasons) run powertop, and it suggests I conserve power by doing 'ethtool -s eth0 wol d'. I look at it, and think that it's daft, because (a) the Dell factory default is WOL disabled and (b) if it wasn't the default, I'd have *set* it to disabled, and (c) I even went back and rebooted and checked the BIOS setting - disabled. Nonetheless: # ethtool eth0 | grep Wake Supports Wake-on: g Wake-on: g Is this expected behavior? pgph055mRDfkl.pgp Description: PGP signature
Re: 2.6.23-mm1 tg3 wake-on-lan oddity...
On Tue, 27 Nov 2007 08:04:28 PST, Michael Chan said: [EMAIL PROTECTED] wrote: (a) the Dell factory default is WOL disabled and (b) if it wasn't the default, I'd have *set* it to disabled, and (c) I even went back and rebooted and checked the BIOS setting - disabled. Nonetheless: # ethtool eth0 | grep Wake Supports Wake-on: g Wake-on: g Is this expected behavior? The new tg3 is supposed to follow the WoL setting in the NVRAM, so this is not expected. We'll have to look into this. Any info that would help? printk's to stick in tg3.c? Dumping the relevant bytes of NVRAM? etc? pgpwUS9hBRdWf.pgp Description: PGP signature
Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.
On Tue, 27 Nov 2007 15:12:42 +0100, Andi Kleen said: OK, short of making IPv4 a module (which would be a worthy task :) At some point there were patches, it is probably not very difficult. But DaveM resisted at some point because he didn't want people to replace the network stack (although I personally don't have a problem with that) Personally, I wouldn't find replacing the ipv4 stack very interesting. However, stress-testing your system for IPv6-readiness by doing 'rmmod ipv4' :) (Though I admit it's something I'd *try* if it was available, but certainly not sufficient for a reason to do it...) pgpAMJ4wTClYk.pgp Description: PGP signature
Re: 2.6.23-mm1 tg3 wake-on-lan oddity...
On Tue, 27 Nov 2007 13:34:57 PST, Michael Chan said: Ideally, the BIOS should modify the NVRAM's setting when it is changed. We will talk to Dell to get their opinion on this as this is very confusing to the user. That would certainly explain what I'm seeing, and I can certainly wait if the answer is indeed Buggy BIOS, fixed in D820 A08 or A09 or whenever. (If Dell cares, I'm at BIOS A07 already). In the meantime, I just stuck an 'ethtool -s eth0 wol d' in /etc/rc.local until a proper fix shows up. pgprI27r6QKbi.pgp Description: PGP signature
Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.
On Tue, 27 Nov 2007 22:09:42 +0100, Adrian Bunk said: Are there common reasons why these drivers are not upstream? Well, on my laptop, I'm currently dragging along 3 out-of-tree kernel modules. 2 are well-known binary blobs so it's between me and the vendor, as usual. The third is a USB webcam driver that happened to get caught at the wrong end of the colorspace-conversion-in-kernel bunfight in the V4L playpen. Somebody wants to figure out how to get the gspca drivers into the kernel, they're at http://mxhaard.free.fr/download.html waiting for attention. ;) (Don't look at *me*, I don't understand the code, or the bunfight - I just happen to have one of the 244 supported webcams, and it works with that driver) pgpRlM5JB5L1m.pgp Description: PGP signature
Re: [RFD] iptables: mangle table obsoletes filter table
On Sun, 21 Oct 2007 07:31:58 +0300, Al Boldi said: Well, for example to stop any transient packets being forwarded. You could probably hack around this using mark's, but you can't stop the implied route lookup, unless you stop it in prerouting. Basically, you have one big unintended gaping whole in your firewall, that could easily be exploited for DoS attacks at the least, unless you put in specific rules to limit this. OK, the light bulb just went on... ;) We actually *do* have an issue with the flip side of that - it's a frikking pain to make packets that show up on eth0 with a destination of 127.0.0.1 go away un-noticed - or at least I'm assuming it's the flip side of the same issue. pgpqsKaExhIYs.pgp Description: PGP signature
Re: [RFD] iptables: mangle table obsoletes filter table
On Sat, 20 Oct 2007 06:40:02 +0300, Al Boldi said: Sure, the idea was to mark the filter table obsolete as to make people start using the mangle table to do their filtering for new setups. The filter table would then still be available for legacy/special setups. But this would only be possible if we at least ported the REJECT target to mangle. That's *half* the battle. The other half is explaining why I should move from a perfectly functional setup that uses the filter table. What gains do I get from doing so? What isn't working that I don't know about? etc? In other words - why do I want to move from filter to mangle? pgp2kRiWNrSxQ.pgp Description: PGP signature
Re: [PATCH 1/2] bnx2: factor out gzip unpacker
On Fri, 21 Sep 2007 19:05:23 BST, Denys Vlasenko said: I plan to use gzip compression on following drivers' firmware, if patches will be accepted: textdata bss dec hex filename 17653 109968 240 127861 1f375 drivers/net/acenic.o 6628 120448 4 127080 1f068 drivers/net/dgrs.o ^^ Should this be redone to use the existing firmware loading framework to load the firmware instead? pgp8MqMgJq48R.pgp Description: PGP signature
Re: [PATCH 1/2] bnx2: factor out gzip unpacker
On Fri, 21 Sep 2007 20:18:06 BST, Denys Vlasenko said: On Friday 21 September 2007 19:36, [EMAIL PROTECTED] wrote: Should this be redone to use the existing firmware loading framework to load the firmware instead? Not in every case. For example, bnx2 maintainer says that driver and firmware are closely tied for his driver. IOW: you upgrade kernel and your NIC is not working anymore. Another argument is to make kernel be able to bring up NICs without needing firmware images in initramfs/initrd/hard drive. OK, I can live with considered and decided against. :) pgpiAVb5VJZHz.pgp Description: PGP signature
Re: follow-up: discrepancy with POSIX
On Wed, 19 Sep 2007 11:38:57 PDT, Rick Jones said: One has to set their way-back machine pretty far back to find the *BSD bits which used 0.0.0.0 as the all nets, all subnets (to mis-use a term) broadcast IPv4 address when sending. Perhaps as far back as the time before HP-UX 7 or SunOS4. The bit errors in my dimm memory get pretty dense that far back... That would be BSD4.2 - BSD4.3 went to all-ones, and it *was* quite the little mess if you had both flavors of boxes on the same subnet at the same time, it would packet-storm *quite* easily. pgpj11lqgjnak.pgp Description: PGP signature
Re: [PATCH]: xfrm audit calls
On Tue, 11 Sep 2007 19:03:14 CDT, Joy Latten said: This patch modifies the current ipsec audit layer by breaking it up into purpose driven audit calls. So far, the only audit calls made are when add/delete an SA/policy. What other audit calls do you envision adding in the future? pgpGAfniPI6M2.pgp Description: PGP signature
Re: That whole Linux stealing our code thing
On Sun, 02 Sep 2007 03:55:37 +0200, Adrian Bunk said: Jiri's patch would have wrongly not only removed the BSD statement from dual licenced files but also from not dual licenced files. This was a mistake in this patch (that was never merged into the tree) neither Jiri nor Alan noticed. You know, we *could* have solved this a *hell* of a lot easier if people quit flaming about it, and we did something *productive* instead. Like submit a corrected patch. :) pgp2tzUDZYjp1.pgp Description: PGP signature
Re: That whole Linux stealing our code thing
On Sun, 02 Sep 2007 01:09:18 EDT, Constantine A. Murenin said: The idea here is that no patching was needed in the first place -- most of the files are/were BSD-licensed, because they were forked from OpenBSD. Oh, silly me. For some reason, I had it in my head that Jiri's original patch actually included some real live *code* in addition to the parts that changed the licensing text... ;) pgp9paOdzcD0N.pgp Description: PGP signature
Re: [PATCH 4/5] Net: ath5k, license is GPLv2
On Tue, 28 Aug 2007 18:11:55 BST, Christoph Hellwig said: On Tue, Aug 28, 2007 at 12:00:50PM -0400, Jiri Slaby wrote: ath5k, license is GPLv2 The files are available only under GPLv2 since now. Is this really a good idea? Most of the reverse-engineering was done by the OpenBSD folks, and it would certainly be helpful to work together with them on new hardware revisions, etc.. The heck with good idea - it's unclear to me if Jiri is even *allowed* to remove the BSD/other license. Jiri can release *his* code as GPLv2 only, but I suspect the files as a whole really should be dual BSD/GPLv2, due to the numerous other stakeholders in those files. pgpYUNBAqpK5l.pgp Description: PGP signature
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Tue, 21 Aug 2007 09:16:43 PDT, Paul E. McKenney said: I agree that instant gratification is hard to come by when synching up compiler and kernel versions. Nonetheless, it should be possible to create APIs that are are conditioned on the compiler version. We've tried that, sort of. See the mess surrounding the whole extern/static/inline/__whatever boondogle, which seems to have changed semantics in every single gcc release since 2.95 or so. And recently mention was made that gcc4.4 will have *new* semantics in this area. Yee. Hah. pgpxaMsiSRSQs.pgp Description: PGP signature
Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
On Sat, 11 Aug 2007 02:38:40 +0200, Segher Boessenkool said: That means GCC cannot compile Linux; it already optimises some accesses to scalars to smaller accesses when it knows it is allowed to. Not often though, since it hardly ever helps in the cost model it employs. Please give an example code snippet + gcc version + arch to back this up. unsigned char f(unsigned long *p) { return *p 1; } Not really valid, because it's still able to do one atomic access to compute the result. Now, if you had found an example where it converts a 32-bit atomic access into 2 separate 16-bit accesses that weren't atomic as a whole pgpvV5YbCLIyT.pgp Description: PGP signature
Re: [dm-devel] Re: [2.6.23 PATCH 13/18] dm: netlink
On Thu, 12 Jul 2007 19:00:59 PDT, Mike Anderson said: No, all admin tools and interfaces function as they do today. The dm-netlink patch series only contains 9 deletions (actual just one true deletion of existing kernel code the others are due to break up of the patch into compilable chunks). The intent was not to break users or force migration. OK, I'll bite - if the admin tools function as before, who is *using* the code? Or do the admin tools have a preferred netlink component and a fallback set of code if that fails? pgpNP3lE0pyDp.pgp Description: PGP signature
Re: [2.6 patch] cleanup congestion control options
On Sat, 14 Jul 2007 06:09:54 +0200, Adrian Bunk said: This patch contains the following cleanups: - note in the prompt if an option depends on EXPERIMENTAL Who decided whether a particular option is 'experimental' or not? Lawrence S. Brakmo and Larry L. Peterson. TCP Vegas: end to end congestion avoidance on a global Internet. IEEE Journal on Selected Areas in Communications, 13(8), October 1995. config TCP_CONG_VEGAS - tristate TCP Vegas + tristate TCP Vegas (EXPERIMENTAL) depends on EXPERIMENTAL I think the *right* fix here is '- depends on EXPERIMENTAL'. 1995. Geez. :) (Probably *all* of the 'experimental' tags for congestion control need an in-depth review - but Vegas struck me as particularly egregious.) pgpBt2EQzjYrW.pgp Description: PGP signature
2.6.22-rc3-mm1 - pppd hanging in netdev_run_todo while holding mutex
On Wed, 30 May 2007 23:58:23 PDT, Andrew Morton said: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc3/2.6.22-rc3-mm1/ Under 22-rc2-mm1, if my VPN connection got reset, ppp0 just quietly went away. Under 22-rc3-mm1, it seems to end up wedged and waiting for references to go away: Jun 4 09:23:01 turing-police kernel: [90089.270707] unregister_netdevice: waiting for ppp0 to become free. Usage count = 8 Jun 4 09:23:11 turing-police kernel: [90099.396121] unregister_netdevice: waiting for ppp0 to become free. Usage count = 8 Jun 4 09:23:21 turing-police kernel: [90109.520574] unregister_netdevice: waiting for ppp0 to become free. Usage count = 8 Jun 4 09:23:32 turing-police kernel: [90119.653129] unregister_netdevice: waiting for ppp0 to become free. Usage count = 8 'echo t /proc/sysrq_trigger' shows pppd hung up here: Jun 4 10:52:57 turing-police kernel: [95478.047892] pppd D 000105ad3830 4968 3815 1 (NOTLB) Jun 4 10:52:57 turing-police kernel: [95478.047902] 810008d5fd78 0086 81000349 Jun 4 10:52:57 turing-police kernel: [95478.047911] 810008d5fd28 810008d4a040 810003461820 810008d4a2b0 Jun 4 10:52:57 turing-police kernel: [95478.047920] 000105ad3733 0202 00ff 80239795 Jun 4 10:52:57 turing-police kernel: [95478.047928] Call Trace: Jun 4 10:52:57 turing-police kernel: [95478.047936] [805207a2] schedule_timeout+0x8d/0xb4 Jun 4 10:52:57 turing-police kernel: [95478.047945] [805207e2] schedule_timeout_uninterruptible+0x19/0x1b Jun 4 10:52:57 turing-police kernel: [95478.047954] [802397bb] msleep+0x14/0x1e Jun 4 10:52:57 turing-police kernel: [95478.047963] [8048aa4e] netdev_run_todo+0x12f/0x234 Jun 4 10:52:57 turing-police kernel: [95478.047972] [8049166f] rtnl_unlock+0x35/0x37 Jun 4 10:52:57 turing-police kernel: [95478.047981] [804894a9] unregister_netdev+0x1e/0x23 Jun 4 10:52:57 turing-police kernel: [95478.047994] [88a5f2c2] :ppp_generic:ppp_shutdown_interface+0x67/0xbb Jun 4 10:52:57 turing-police kernel: [95478.048018] [88a5f5b8] :ppp_generic:ppp_release+0x33/0x65 Jun 4 10:52:57 turing-police kernel: [95478.048028] [8028d54a] __fput+0xac/0x176 Jun 4 10:52:57 turing-police kernel: [95478.048036] [8028d628] fput+0x14/0x16 Jun 4 10:52:57 turing-police kernel: [95478.048045] [8028a9c6] filp_close+0x66/0x71 Jun 4 10:52:57 turing-police kernel: [95478.048054] [8028bd54] sys_close+0x98/0xd7 Jun 4 10:52:57 turing-police kernel: [95478.048062] [8020a03c] tracesys+0xdc/0xe1 Jun 4 10:52:57 turing-police kernel: [95478.048073] [2b45cd2429a0] Which in itself wouldn't be so bad, except that it's holding a mutex and lots of other stuff gets wedged up waiting for it (here's 1 of 6 processes that was wedged this morning): Jun 4 10:52:58 turing-police kernel: [95478.051129] ifconfig D 810005e19820 5800 9787 20510 (NOTLB) Jun 4 10:52:58 turing-police kernel: [95478.051141] 81000868fd08 0082 81000868fec8 0246 Jun 4 10:52:58 turing-police kernel: [95478.051150] 00010101 810005e19820 810003fe0820 810005e19a90 Jun 4 10:52:58 turing-police kernel: [95478.051159] 0a3f26c0 0006 81000868ff28 8028aacc Jun 4 10:52:58 turing-police kernel: [95478.051167] Call Trace: Jun 4 10:52:58 turing-police kernel: [95478.051176] [80520bc4] __mutex_lock_slowpath+0x74/0xb6 Jun 4 10:52:58 turing-police kernel: [95478.051185] [805209f3] mutex_lock+0xe/0x10 Jun 4 10:52:58 turing-police kernel: [95478.051193] [8048a938] netdev_run_todo+0x19/0x234 Jun 4 10:52:58 turing-police kernel: [95478.051202] [8049166f] rtnl_unlock+0x35/0x37 Jun 4 10:52:58 turing-police kernel: [95478.051210] [8048a3f2] dev_ioctl+0x3e3/0x483 Jun 4 10:52:58 turing-police kernel: [95478.051218] [8047df30] sock_ioctl+0x1ef/0x1fc Jun 4 10:52:58 turing-police kernel: [95478.051227] [802989be] do_ioctl+0x2a/0x77 Jun 4 10:52:58 turing-police kernel: [95478.051235] [80298c52] vfs_ioctl+0x247/0x264 Jun 4 10:52:58 turing-police kernel: [95478.051243] [80298cce] sys_ioctl+0x5f/0x85 Jun 4 10:52:58 turing-police kernel: [95478.051252] [8020a03c] tracesys+0xdc/0xe1 (And of course, you can't shutdown cleanly, because /etc/init.d/network tries to down other interfaces on the way out, and) I'd bisect this, except I don't have a better way to replicate it than wait for our VPN box to reset the connection after 24 hours of connect - basically means I get 2 tries per weekend..) An hour or so of digging through the -rc3-mm1 broken-out/ didn't find any obvious-to-me culprits. Any ideas/suggestions? pgpgLKOKJ5mzu.pgp Description: PGP signature
Re: [PATCH 1/5] AF_RXRPC: Add blkcipher accessors for using kernel data directly
On Thu, 08 Mar 2007 22:48:29 GMT, David Howells said: diff --git a/include/linux/crypto.h b/include/linux/crypto.h index 779aa78..ce092fe 100644 --- a/include/linux/crypto.h +++ b/include/linux/crypto.h @@ -40,7 +40,10 @@ #define CRYPTO_ALG_LARVAL0x0010 #define CRYPTO_ALG_DEAD 0x0020 #define CRYPTO_ALG_DYING 0x0040 -#define CRYPTO_ALG_ASYNC 0x0080 + +#define CRYPTO_ALG_CAP_MASK 0x0180 /* capabilities mask */ +#define CRYPTO_ALG_ASYNC 0x0080 /* capable of async operation */ +#define CRYPTO_ALG_DMA 0x0100 /* capable of using of DMA */ Would it make sense to define ALG_CAP_MASK as 0xF80 or similar, to reserve a few bits? The alternative has somebody else grabbing 0x200 for some other purpose, and then when you want to add another capability bit, you end up with a CAP_MASK of 0x580 - this way leads to madness and bugs pgpLOkxmjXka5.pgp Description: PGP signature
Re: Network drivers that don't suspend on interface down
On Wed, 20 Dec 2006 22:06:51 EST, Dan Williams said: It's also complicated because some switches are supposed to rfkill both an 802.11 module _and_ a bluetooth module at the same time, or I guess some laptops may even have one rfkill switch for each wireless device. On my Dell D820, it's bios-selectable if the switch is enabled, or if it controls just the 802.11 card, or 802.11 and bluetooth, or just bluetooth, or 802.11 and mobile broadband, or ... This way lies madness. :) (Oddest part - said bios config screen offers the choices for bluetooth and mobile broadband even though the hardware config doesn't include it. ;) pgpXVzFZgOXwP.pgp Description: PGP signature
Re: [PATCH 1/5] remove TxStartThresh and RxEarlyThresh
On Mon, 16 Oct 2006 07:26:37 +1000, Benjamin Herrenschmidt said: Somebody patented FIFO thresholds ? Gack ? The US PTO is fundamentally busticated. http://www.engadget.com/2006/10/14/cisco-patents-the-triple-play/ Cisco got a patent on the concept of delivering voice, internet, and cable TV over one cable. Now admittedly, when they applied for it in 2000, it wasn't a buzzword yet - but I'm pretty sure that there was prior art. Back to the case at hand... In the case of the TxStartThresh and RxEarlyThresh, I don't think it's FIFO thresholds per se that are a problem - the note specifically mentioned cut-through, which is a specific technique of starting to deal with the alread-arrived head end of the packet *before* the tail end has arrived yet. e.g. if you read a packet that has 16 bytes of control info followed by 64 bytes of data, you have finished parsing the first 16 and have set stuff up by the time the 64 bytes starts arriving - even though you only started *one* read of 80 bytes). Of course, even *that* is an old technique - I remember discussion (and possibly implementation) of being able to read the front of an Ethernet packet, and do the routing table lookup fast enough so that you could start transmitting the packet on the outbound interface before it had finished arriving on the inbound. Of course, this was back when Proteon and Bay were start-ups, nobody did IP option fields or router ACLs or stuff like that, and level-3 routers were not much smarter (and perhaps stupider) than today's level-2 switches that filter/route based on MAC address... Maybe the patent is on the fact that you can't do cut-through routing well without enforcing certain relationships on the Rx and Tx FIFO thresholds... pgpvFHZbnBVd4.pgp Description: PGP signature
Re: 2.6.18-mm2 - oops in cache_alloc_refill()
On Mon, 02 Oct 2006 10:52:45 PDT, Jean Tourrilhes said: On Fri, Sep 29, 2006 at 06:20:08PM -0700, Andrew Morton wrote: On Fri, 29 Sep 2006 20:01:54 -0400 % grep ioctl /tmp/foo2 | sort -u | more ioctl(13, SIOCGIWESSID, 0xbfbcdb9c) = 0 ioctl(13, SIOCGIWRANGE, 0xbfbcdbdc) = 0 ioctl(13, SIOCGIWRATE, 0xbfbcdbbc) = 0 Yes. The main thing which those WE-21 patches do is to shorten the size of various buffers which are used in wireless ioctls. Ok, I've found it. Actually, I feel ashamed, as it is a fairly classical buffer overflow, we put one extra char in a buffer. Now, I don't understand why it did not blow up on my box ;-) New patch. I think it is right, but I would not mind Pavel to have a look at it. On my box it does not make thing worse. Valdis : would you mind trying if this patch fix the problem you are seeing with WE-21 ? If it fixes it, I'll send it to John... Been up and running with we-21 configured in, and gkrellm doing the monitoring that gave it indigestion. It was dying in 1-2 minutes, now been up for 30 mins with no issues pgpGWS0PQ0L7P.pgp Description: PGP signature
Re: 2.6.18-mm2 - oops in cache_alloc_refill()
On Fri, 29 Sep 2006 23:31:07 EDT, [EMAIL PROTECTED] said: Fair enough, I'm going to try reverting the 2 commits and see if things behave better. OK, it's definitely something in those 2 commits - I reverted them and the resulting 2.6.18-mm2 kernel has been up and stable for 4 hours, even with the problem gkrellm updating once a second the whole time. I'm not *seeing* how those changes can cause trouble - unless it's this: diff --git a/drivers/net/wireless/orinoco.c b/drivers/net/wireless/orinoco.c index 1840b69..9e19a96 100644 --- a/drivers/net/wireless/orinoco.c +++ b/drivers/net/wireless/orinoco.c @@ -3037,7 +3037,7 @@ static int orinoco_ioctl_getessid(struct } erq-flags = 1; - erq-length = strlen(essidbuf) + 1; + erq-length = strlen(essidbuf); Does some other code go batshit if length ==0? My current config doesn't try to actually ifup the wireless if I also have connectivity via copper (in order to avoid chewing up a DHCP lease in crowded address space if not needed). % iwconfig eth5 eth5 IEEE 802.11b ESSID: Nickname:HERMES I Mode:Managed Frequency:2.457 GHz Access Point: Not-Associated Bit Rate:11 Mb/s Sensitivity:1/3 Retry limit:4 RTS thr:off Fragment thr:off Power Management:off Link Quality=0/92 Signal level=134/153 Noise level=134/153 Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0 Tx excessive retries:0 Invalid misc:0 Missed beacon:0 That ESSID the source of the trouble? pgpCHCTBdDwWg.pgp Description: PGP signature
Re: 2.6.18-mm2 - oops in cache_alloc_refill()
On Fri, 29 Sep 2006 12:45:58 PDT, Andrew Morton said: (Adding a bunch of people to the cc: list now that I have a clue what is going on) I'd expect it's the same bug - slab data structures have gone bad. *bing*! We have a winner. A quick check showed the kernel wasn't built with slab debugging enabled, so I turned on the more obvious options, and got rewarded with a traceback.. Again: how come nobody else is hitting this? Something's different. gkrellm and wireless (specifically, gkrellm-wifi-0.9.12-3.fc6 from Fedora Core extras-development). Kernel is still a 2.6.18 with *only* the origin.patch from -mm2 applied. Note that the gkrellm plugin hasn't had a change in the code since 01/03/2004 - hopefully there's been no unintentional API change on the kernel side since then... Here's the traceback I got: slab error in verify_redzone_free(): cache `size-32': memory outside object was overwritten [c0103ad2] dump_trace+0x64/0x1cd [c0103c4d] show_trace_log_lvl+0x12/0x25 [c010415f] show_trace+0xd/0x10 [c01041fc] dump_stack+0x19/0x1b [c014c796] __slab_error+0x17/0x1c [c014cdac] cache_free_debugcheck+0xaf/0x230 [c014d43e] kfree+0x59/0x8c [c02dc04a] ioctl_standard_call+0x1da/0x218 [c02dc275] wireless_process_ioctl+0x55/0x312 [c02d3750] dev_ioctl+0x45f/0x49a [c02c92aa] sock_ioctl+0x1b3/0x1c6 [c0160322] do_ioctl+0x22/0x67 [c01605a5] vfs_ioctl+0x23e/0x251 [c01605ff] sys_ioctl+0x47/0x64 [c0102cd3] syscall_call+0x7/0xb DWARF2 unwinder stuck at syscall_call+0x7/0xb Leftover inexact backtrace: === de57e16c: redzone 1:0x170fc2a5, redzone 2:0x170fc200. Repeated, over and over, just about once a second. A quick strace of gkrellm finds these likely ioctl's causing the problem: % grep ioctl /tmp/foo2 | sort -u | more ioctl(13, SIOCGIWESSID, 0xbfbcdb9c) = 0 ioctl(13, SIOCGIWRANGE, 0xbfbcdbdc) = 0 ioctl(13, SIOCGIWRATE, 0xbfbcdbbc) = 0 Since I'm using an orinoco-based card, these 2 look like the most likely candidates. WE-21 was merged between -mm1 and -mm2, which is why -mm1 was stable for me. I'll let somebody else argue over what path these took that I never tripped over them in an earlier -mm before they hit Linus's tree... commit baef186519c69b11cf7e48c26e75feb1e6173baa Author: John W. Linville [EMAIL PROTECTED] Date: Fri Sep 8 16:04:05 2006 -0400 [PATCH] WE-21 support (core API) This is version 21 of the Wireless Extensions. Changelog : o finishes migrating the ESSID API (remove the +1) o netdev-get_wireless_stats is no more o long/short retry This is a redacted version of a patch originally submitted by Jean Tourrilhes. I removed most of the additions, in order to minimize future support requirements for nl80211 (or other WE successor). CC: Jean Tourrilhes [EMAIL PROTECTED] Signed-off-by: John W. Linville [EMAIL PROTECTED] commit eeec9f1a931262d69811135092c8447d6dccc3e6 Author: Jean Tourrilhes [EMAIL PROTECTED] Date: Tue Aug 29 18:02:31 2006 -0700 [PATCH] WE-21 for orinoco Signed-off-by: Jean Tourrilhes [EMAIL PROTECTED] Signed-off-by: John W. Linville [EMAIL PROTECTED] pgpbwHgiC7SFJ.pgp Description: PGP signature
Re: 2.6.18-mm2 - oops in cache_alloc_refill()
On Fri, 29 Sep 2006 18:40:43 PDT, Jean Tourrilhes said: On Fri, Sep 29, 2006 at 06:20:08PM -0700, Andrew Morton wrote: On Fri, 29 Sep 2006 20:01:54 -0400 A quick strace of gkrellm finds these likely ioctl's causing the problem: % grep ioctl /tmp/foo2 | sort -u | more ioctl(13, SIOCGIWESSID, 0xbfbcdb9c) = 0 ioctl(13, SIOCGIWRANGE, 0xbfbcdbdc) = 0 ioctl(13, SIOCGIWRATE, 0xbfbcdbbc) = 0 Excuse me, can you point out wich version of gkrellm you use and where to find it, the only version that is listed on my page does not use the ESSID ioctl. I want to be sure I'm looking at the same thing as you are... All the pieces: http://download.fedora.redhat.com/pub/fedora/linux/extras/development/SRPMS/ The particular plugin causing the trouble: http://download.fedora.redhat.com/pub/fedora/linux/extras/development/SRPMS/gkrellm-wifi-0.9.12-3.fc6.src.rpm If you're not on a box that has rpm2cpio or similar, yell and I'll break that .src.rpm up for you - there's basically just an 18K .tar.gz and a 14K patch in there. pgplkLGOSG7mj.pgp Description: PGP signature
Re: 2.6.18-mm2 - oops in cache_alloc_refill()
On Fri, 29 Sep 2006 18:33:48 PDT, Jean Tourrilhes said: On Fri, Sep 29, 2006 at 06:20:08PM -0700, Andrew Morton wrote: On Fri, 29 Sep 2006 20:01:54 -0400 Here's the traceback I got: slab error in verify_redzone_free(): cache `size-32': memory outside object was overwritten Hum... Not clear what's happening. I'll look more into it on monday. Fair enough, I'm going to try reverting the 2 commits and see if things behave better. I'm using Orinoco, I've not seen that with iwconfig. I'll look into that... I'll bet it's the difference between a modern iwconfig and a 3-year-old stone-age gkrellm plugin :) pgpI5j66P9UDg.pgp Description: PGP signature
Re: [PATCH 00/03][RESUBMIT] net: EtherIP tunnel driver
On Sat, 23 Sep 2006 15:27:36 +0200, Joerg Roedel said: (I assume you are speaking of the position of the 3 in the header). The RFC is not clear at this point. It defines that the first 4 bits in the 16 bit Ethernet header MUST be 0011. But it don't defines the byteorder of that 16 bit word nor if the least or most significant bit comes first. Unless stated otherwise, it's pretty safe to assume that all on the wire data mentioned in an RFC is in 'network byte order'. That's why hton*() and ntoh*() functions exist... Is there something in the RFC that suggests that a byte order other than 'network order' is possible/acceptable there? pgpdywPD9wfOm.pgp Description: PGP signature
Re: 2.6.18-rc7-mm1: networking breakage on HPC nx6325 + SUSE 10.1
On Tue, 19 Sep 2006 23:30:34 +0200, Rafael J. Wysocki said: Well, I can configure the interfaces manually, with ifconfig, but the SUSE's configuration tools don't work. For example, ifup eth0 tells me that No configuration found for eth0 and that's all. I'm seeing issues on a Dell Latitude C840 as well, but I'm not positive it's the same bug(s). The problem I'm seeing is that device renaming is failing (I have up to 5 different ethernet-ish interfaces that can be connected, so I abuse /sbin/nameif extensively. There seem to be some other issues with pcmcia, but it's not clear what the problem is - it manages to find the (normally down) ethernet on my Xircom card, but the orinoco driver seems unable to find my wireless card For instance, under 2.6.18-rc6-mm2, I see: pccard: CardBus card inserted into slot 0 PCI: Enabling device :03:00.0 ( - 0003) ACPI: PCI Interrupt :03:00.0[A] - Link [LNKD] - GSI 11 (level, low) - IRQ 11 PCI: Setting latency timer of device :03:00.0 to 64 eth2: Xircom cardbus revision 3 at irq 11 PCI: Enabling device :03:00.1 ( - 0003) ACPI: PCI Interrupt :03:00.1[A] - Link [LNKD] - GSI 11 (level, low) - IRQ 11 :03:00.1: ttyS1 at I/O 0xe080 (irq = 11) is a 16550A pccard: PCMCIA card inserted into slot 2 [rename_device:851]: Changing netdevice name from [eth1] to [eth3] ohci1394: fw-host0: AT dma reset ctx=0, aborting transmission ieee1394: Current remote IRM is not 1394a-2000 compliant, resetting... ieee1394: Host added: ID:BUS[0-00:1023] GUID[374fc0002a71c021] [rename_device:1237]: Changing netdevice name from [eth2] to [eth1] cs: memory probe 0xf400-0xfbff: excluding 0xf400-0xf8ff 0xfa00-0xfbff pcmcia: registering new device pcmcia2.0 orinoco 0.15 (David Gibson [EMAIL PROTECTED], Pavel Roskin [EMAIL PROTECTED], et al) orinoco_cs 0.15 (David Gibson [EMAIL PROTECTED], Pavel Roskin [EMAIL PROTECTED], et al) pcmcia: request for exclusive IRQ could not be fulfilled. pcmcia: the driver needs updating to supported shared IRQ lines. cs: IO port probe 0x100-0x3af: excluding 0x370-0x37f cs: IO port probe 0x3e0-0x4ff: clean. cs: IO port probe 0x820-0x8ff: clean. cs: IO port probe 0xc00-0xcf7: clean. cs: IO port probe 0xa00-0xaff: clean. cs: IO port probe 0x100-0x3af: excluding 0x370-0x37f cs: IO port probe 0x3e0-0x4ff: clean. cs: IO port probe 0x820-0x8ff: clean. cs: IO port probe 0xc00-0xcf7: clean. cs: IO port probe 0xa00-0xaff: clean. cs: IO port probe 0x100-0x3af: excluding 0x370-0x37f cs: IO port probe 0x3e0-0x4ff: clean. cs: IO port probe 0x820-0x8ff: clean. cs: IO port probe 0xc00-0xcf7: clean. cs: IO port probe 0xa00-0xaff: clean. eth2: Hardware identity 0005:0004:0005: eth2: Station identity 001f:0001:0008:000a eth2: Firmware determined as Lucent/Agere 8.10 eth2: Ad-hoc demo mode supported eth2: IEEE standard IBSS ad-hoc mode supported eth2: WEP supported, 104-bit key eth2: MAC address 00:02:2D:5C:11:48 eth2: Station name HERMES I eth2: ready eth2: orinoco_cs at 2.0, irq 11, io 0xe100-0xe13f [rename_device:1295]: Changing netdevice name from [eth2] to [eth5] Non-volatile memory driver v1.2 and under -rc7-mm1, I see: pccard: CardBus card inserted into slot 0 PCI: Enabling device :03:00.0 ( - 0003) ACPI: PCI Interrupt :03:00.0[A] - Link [LNKD] - GSI 11 (level, low) - IRQ 11 PCI: Setting latency timer of device :03:00.0 to 64 eth1: Xircom cardbus revision 3 at irq 11 PCI: Enabling device :03:00.1 ( - 0003) ACPI: PCI Interrupt :03:00.1[A] - Link [LNKD] - GSI 11 (level, low) - IRQ 11 :03:00.1: ttyS1 at I/O 0xe080 (irq = 11) is a 16550A pccard: PCMCIA card inserted into slot 2 ohci1394: fw-host0: AT dma reset ctx=0, aborting transmission ieee1394: Current remote IRM is not 1394a-2000 compliant, resetting... ieee1394: Host added: ID:BUS[0-00:1023] GUID[374fc0002a71c021] Non-volatile memory driver v1.2 Amazingly less chatty. Much later, when /etc/rc5.d/S10network runs, we finally see: orinoco 0.15 (David Gibson [EMAIL PROTECTED], Pavel Roskin [EMAIL PROTECTED], et al) orinoco_cs 0.15 (David Gibson [EMAIL PROTECTED], Pavel Roskin [EMAIL PROTECTED], et al) but no output for the wireless configuring. Unless somebody has a better idea overnight, I'll start a bisect of -rc7-mm1 in the morning... pgpbWRrp6DrOR.pgp Description: PGP signature
Re: 2.6.18-rc3-mm2 - IPV6_MULTIPLE_TABLES borked....
On Sun, 06 Aug 2006 03:08:09 PDT, Andrew Morton said: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18-rc3/2.6.18-rc3-mm2/ Building a kernel with IPV6_MULTIPLE_TABLES=y breaks my IPv6 connectivity quite badly. It basically totally refuses to answer an IPv6 Neighbor Solicit packet or IPv6 Echo Request packet. I run a 'tcpdump -n ipv6', and I see the requests come in, and no packets leaving. Interestingly enough, if I try to ping6 *out* of the box, it's totally willing to send a Neighbor Solicit outbound (although it appears to totally ignore the Neighbor Advert packet that comes back). Of course, things don't work very well at all with busticated Neighbor Solicit. A kernel built with IPV6_MULTIPLE_TABLES=n works just fine. The relevant ifconfig (eth3 is a 100mbit port, eth5 is a wireless card): eth3 Link encap:Ethernet HWaddr 00:06:5B:EA:8E:4E inet addr:128.173.14.107 Bcast:128.173.15.255 Mask:255.255.252.0 inet6 addr: 2001:468:c80:2103:206:5bff:feea:8e4e/64 Scope:Global inet6 addr: fe80::206:5bff:feea:8e4e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:15529 errors:0 dropped:0 overruns:1 frame:0 TX packets:2073 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2333290 (2.2 MiB) TX bytes:228862 (223.4 KiB) Interrupt:11 Base address:0x6800 eth5 Link encap:Ethernet HWaddr 00:02:2D:5C:11:48 inet addr:198.82.168.129 Bcast:198.82.168.255 Mask:255.255.255.0 inet6 addr: 2001:468:c80:2181:202:2dff:fe5c:1148/64 Scope:Global inet6 addr: fe80::202:2dff:fe5c:1148/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2096 errors:0 dropped:0 overruns:0 frame:0 TX packets:144 errors:1 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:280919 (274.3 KiB) TX bytes:22184 (21.6 KiB) Interrupt:11 Base address:0xe100 loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:1583 errors:0 dropped:0 overruns:0 frame:0 TX packets:1583 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:642598 (627.5 KiB) TX bytes:642598 (627.5 KiB) A working routing table: netstat -r -n -A inet6 Kernel IPv6 routing table Destination Next Hop Flags Metric RefUse Iface ::1/128 :: U 0 12 1 lo 2001:468:c80:2103:206:5bff:feea:8e4e/128:: U 0 41 lo 2001:468:c80:2103::/64 :: UA256113 0 eth3 2001:468:c80:2181:202:2dff:fe5c:1148/128:: U 0 01 lo 2001:468:c80:2181::/64 :: UA25611 0 eth5 fe80::202:2dff:fe5c:1148/128:: U 0 01 lo fe80::206:5bff:feea:8e4e/128:: U 0 21 lo fe80::/64 :: U 25600 eth3 fe80::/64 :: U 25600 eth5 ff02::1/128 ff02::1 UC0 113 0 eth3 ff02::1/128 ff02::1 UC0 10 eth5 ff00::/8:: U 25600 eth3 ff00::/8:: U 25600 eth5 ::/0fe80::20f:35ff:fe3e:d41a UGDA 1024 10 eth3 ::/0fe80::20f:35ff:fe3e:d41a UGDA 1024 10 eth5 pgp0hv0N6FUv3.pgp Description: PGP signature
Re: 2.6.18-rc3-mm2 - IPV6_MULTIPLE_TABLES borked....
On Thu, 10 Aug 2006 22:02:03 +0200, Patrick McHardy said: [EMAIL PROTECTED] wrote: On Sun, 06 Aug 2006 03:08:09 PDT, Andrew Morton said: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18-rc3/2.6.18-rc3-mm2/ Building a kernel with IPV6_MULTIPLE_TABLES=y breaks my IPv6 connectivity It should be fixed by this patch (already contained in net-2.6.19). Confirmed fixed, thanks... pgp35bA5bBOzS.pgp Description: PGP signature
2.6.18-rc3-mm2 - BUG in rt6_lookup() from ipv6_del_addr()
On Sun, 06 Aug 2006 03:08:09 PDT, Andrew Morton said: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18-rc3/2.6.18-rc3-mm2/ After applying the patch that Patrick McHardy pointed me at, it lived longer. However, I'm now seeing problems at system shutdown (or anytime you try to 'ifdown ethX' where ethX has an IPv6 address attached to it): [ 196.346000] BUG: unable to handle kernel NULL pointer dereference at virtual address 0014 [ 196.347000] printing eip: [ 196.348000] c032c436 [ 196.348000] *pde = [ 196.349000] Oops: [#1] [ 196.349000] 4K_STACKS PREEMPT [ 196.349000] last sysfs file: /class/net/eth1/address [ 196.349000] Modules linked in: thermal sony_acpi processor fan button battery ac nfnetlink i8k floppy nvram orinoco_cs orinoco hermes pcmcia firmware_class ohci1394 ieee1394 intel_agp agpgart iTCO_wdt yenta_socket rsrc_nonstatic pcmcia_core rtc [ 196.349000] CPU:0 [ 196.349000] EIP:0060:[c032c436]Not tainted VLI [ 196.349000] EFLAGS: 00010246 (2.6.18-rc3-mm2 #4) [ 196.349000] EIP is at rt6_lookup+0x47/0x83 [ 196.349000] eax: ebx: ecx: 0005 edx: [ 196.349000] esi: e8b25c98 edi: e8b25c20 ebp: e8b25c78 esp: e8b25c20 [ 196.349000] ds: 007b es: 007b ss: 0068 [ 196.349000] Process ip (pid: 2511, ti=e8b25000 task=effb0aa0 task.ti=e8b25000) [ 196.349000] Stack: 0005 80fe [ 196.349000] [ 196.349000] 0008 eb6e98c8 e8b25ca8 e8b25cb4 c0327c04 [ 196.349000] Call Trace: [ 196.349000] [c0327c04] ipv6_del_addr+0x2ef/0x3a7 [ 196.349000] [c0327d3f] inet6_addr_del+0x83/0xbb [ 196.349000] [c0327dd6] inet6_rtm_deladdr+0x5f/0x6b [ 196.349000] [c02da097] rtnetlink_rcv_msg+0x1b3/0x1d6 [ 196.349000] [c02e011c] netlink_run_queue+0x5a/0xc6 [ 196.349000] [c02d9e9d] rtnetlink_rcv+0x29/0x42 [ 196.349000] [c02e0576] netlink_data_ready+0x12/0x49 [ 196.349000] [c02df518] netlink_sendskb+0x1c/0x4d [ 196.349000] [c02dfea0] netlink_unicast+0x1c4/0x1d0 [ 196.349000] [c02e0557] netlink_sendmsg+0x274/0x281 [ 196.349000] [c02ca57e] sock_sendmsg+0xeb/0x106 [ 196.349000] [c02cad99] sys_sendto+0xbe/0xdc [ 196.349000] [c02cb522] sys_socketcall+0xfb/0x186 [ 196.349000] [c0102849] sysenter_past_esp+0x56/0x79 [ 196.349000] DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x79 [ 196.349000] Leftover inexact backtrace: [ 196.349000] [c01036c7] show_stack_log_lvl+0x8c/0x97 [ 196.349000] [c010381f] show_registers+0x14d/0x1de [ 196.349000] [c0103a5b] die+0x1ab/0x26d [ 196.349000] [c0352205] do_page_fault+0x3f8/0x4c5 [ 196.349000] [c0351271] error_code+0x39/0x40 [ 196.349000] [c0327c04] ipv6_del_addr+0x2ef/0x3a7 [ 196.349000] [c0327d3f] inet6_addr_del+0x83/0xbb [ 196.349000] [c0327dd6] inet6_rtm_deladdr+0x5f/0x6b [ 196.349000] [c02da097] rtnetlink_rcv_msg+0x1b3/0x1d6 [ 196.349000] [c02e011c] netlink_run_queue+0x5a/0xc6 [ 196.349000] [c02d9e9d] rtnetlink_rcv+0x29/0x42 [ 196.349000] [c02e0576] netlink_data_ready+0x12/0x49 [ 196.349000] [c02df518] netlink_sendskb+0x1c/0x4d [ 196.349000] [c02dfea0] netlink_unicast+0x1c4/0x1d0 [ 196.349000] [c02e0557] netlink_sendmsg+0x274/0x281 [ 196.349000] [c02ca57e] sock_sendmsg+0xeb/0x106 [ 196.349000] [c02cad99] sys_sendto+0xbe/0xdc [ 196.349000] [c02cb522] sys_socketcall+0xfb/0x186 [ 196.349000] [c0102849] sysenter_past_esp+0x56/0x79 [ 196.349000] Code: eb ff 89 5d a8 8d 45 b0 b9 10 00 00 00 89 f2 e8 c9 e0 eb ff 31 d2 83 7d 08 00 0f 95 c2 b9 ad cc 32 c0 89 f8 e8 47 7c 01 00 89 c3 66 83 7b 14 00 74 2d 8b 43 04 85 c0 7f 21 68 c4 19 37 c0 68 99 [ 196.349000] EIP: [c032c436] rt6_lookup+0x47/0x83 SS:ESP 0068:e8b25c20 The unlucky 'ip' process then gets a SIGSEGV and dies while holding a lock of some sort, so later 'ip' processes get hung in 'D' state. Checking the lkml and netdev archives didn't find any useful hits for 'ipv6_addr_rel'... pgpPNQBNHkWRz.pgp Description: PGP signature
Re: 2.6.18-rc1-mm1 - VPN chewing CPU like crazy..
On Mon, 10 Jul 2006 23:19:39 PDT, Andrew Morton said: Any suggestions/hints (besides rebuilding the implicated .ko with debugging symbols so oprofile can be more granular - that's already on the to-do list)? I'd suggest you whack sysrq-T 5-10 times when it happens, capture a few stack traces. D-Oh! -- Homer Simpson. I knew that. :) pgpyZkkRQbKJz.pgp Description: PGP signature
Re: Router stops routing after changing MAC Address
On Mon, 13 Mar 2006 17:35:50 EST, linux-os (Dick Johnson) said: Bst... Not! There are not any MAC addresses associated with any of the intercity links, usually not even in WANs! MAC is for Ethernet! Once you go to fiber, ATM, T-N, etc., there are no MAC addresses. This will come as a big surprise to those places running Gig-E and 10G-E links into a fiber for long-haul cross-country connectivity. pgpO89beRHvTt.pgp Description: PGP signature