Re: [PATCH] bpfilter: fix user mode helper cross compilation
On Wed, 20 Jun 2018 16:04:34 +0200 Matteo Croce wrote: > Use $(OBJDUMP) instead of literal 'objdump' to avoid > using host toolchain when cross compiling. > I'm still having issues here, with ld. x86_64 machine, ARCH=i386: y:/usr/src/25> make V=1 M=net/bpfilter test -e include/generated/autoconf.h -a -e include/config/auto.conf || ( \ echo >&2; \ echo >&2 " ERROR: Kernel configuration is invalid."; \ echo >&2 " include/generated/autoconf.h or include/config/auto.conf are missing.";\ echo >&2 " Run 'make oldconfig && make prepare' on kernel src to fix it."; \ echo >&2 ; \ /bin/false) mkdir -p net/bpfilter/.tmp_versions ; rm -f net/bpfilter/.tmp_versions/* make -f ./scripts/Makefile.build obj=net/bpfilter (cat /dev/null; echo kernel/net/bpfilter/bpfilter.ko;) > net/bpfilter/modules.order ld -m elf_i386 -r -o net/bpfilter/bpfilter.o net/bpfilter/bpfilter_kern.o net/bpfilter/bpfilter_umh.o ; scripts/mod/modpost net/bpfilter/bpfilter.o ld: i386:x86-64 architecture of input file `net/bpfilter/bpfilter_umh.o' is incompatible with i386 output scripts/Makefile.build:530: recipe for target 'net/bpfilter/bpfilter.o' failed make[1]: *** [net/bpfilter/bpfilter.o] Error 1 Makefile:1518: recipe for target '_module_net/bpfilter' failed make: *** [_module_net/bpfilter] Error 2 y:/usr/src/25> ld --version GNU ld (GNU Binutils for Ubuntu) 2.29.1
Re: [lkp-robot] 486ad79630 [ 15.532543] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
On Wed, 2 May 2018 21:58:25 -0700 Cong Wang <xiyou.wangc...@gmail.com> wrote: > On Wed, May 2, 2018 at 9:27 PM, Andrew Morton <a...@linux-foundation.org> > wrote: > > > > So it's saying that something which got committed into Linus's tree > > after 4.17-rc3 has caused a NULL deref in > > sock_release->llc_ui_release+0x3a/0xd0 > > Do you mean it contains commit 3a04ce7130a7 > ("llc: fix NULL pointer deref for SOCK_ZAPPED")? That was in 4.17-rc3 so if this report's bisection is correct, that patch is innocent. origin.patch (http://ozlabs.org/~akpm/mmots/broken-out/origin.patch) contains no changes to net/llc/af_llc.c so perhaps this crash is also occurring in 4.17-rc3 base.
Re: [lkp-robot] 486ad79630 [ 15.532543] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
(networking cc's added) On Thu, 3 May 2018 12:14:50 +0800 kernel test robot <shun@intel.com> wrote: > Greetings, > > 0day kernel testing robot got the below dmesg and the first bad commit is > > git://git.cmpxchg.org/linux-mmotm.git master > > commit 486ad79630d0ba0b7205a8db9fe15ba392f5ee32 > Author: Andrew Morton <a...@linux-foundation.org> > AuthorDate: Fri Apr 20 22:00:53 2018 + > Commit: Johannes Weiner <han...@cmpxchg.org> > CommitDate: Fri Apr 20 22:00:53 2018 + > > origin OK, this got confusing. origin.patch is the diff between 4.17-rc3 and current mainline. > > [many lines deleted] > > [main] Setsockopt(101 c 1b24000 a) on fd 177 [3:5:240] > [main] Setsockopt(1 2c 1b24000 4) on fd 178 [5:2:0] > [main] Setsockopt(29 8 1b24000 4) on fd 180 [10:1:0] > [main] Setsockopt(1 20 1b24000 4) on fd 181 [26:2:125] > [main] Setsockopt(11 1 1b24000 4) on fd 183 [2:2:17] > [ 15.532543] BUG: unable to handle kernel NULL pointer dereference at > 0004 > [ 15.534143] PGD 80001734b067 P4D 80001734b067 PUD 17350067 PMD 0 > [ 15.535516] Oops: 0002 [#1] PTI > [ 15.536165] Modules linked in: > [ 15.536798] CPU: 0 PID: 363 Comm: trinity-main Not tainted > 4.17.0-rc1-1-g486ad79 #2 > [ 15.538396] RIP: 0010:llc_ui_release+0x3a/0xd0 > [ 15.539293] RSP: 0018:c915bd70 EFLAGS: 00010202 > [ 15.540345] RAX: 0001 RBX: 88001fa60008 RCX: > 0006 > [ 15.541802] RDX: 0006 RSI: 88001fdda660 RDI: > 88001fa60008 > [ 15.543139] RBP: c915bd80 R08: R09: > > [ 15.544725] R10: R11: R12: > > [ 15.546287] R13: 88001fa61730 R14: 88001e130a60 R15: > 880019bdb3f0 > [ 15.547962] FS: 7f2221bb1700() GS:82034000() > knlGS: > [ 15.549848] CS: 0010 DS: ES: CR0: 80050033 > [ 15.551186] CR2: 0004 CR3: 1734e000 CR4: > 06b0 > [ 15.552671] DR0: 02232000 DR1: DR2: > > [ 15.554105] DR3: DR6: 0ff0 DR7: > 0600 > [ 15.34] Call Trace: > [ 15.556049] sock_release+0x14/0x60 > [ 15.556767] sock_close+0xd/0x20 > [ 15.557427] __fput+0xba/0x1f0 > [ 15.558058] fput+0x9/0x10 > [ 15.558682] task_work_run+0x73/0xa0 > [ 15.559416] do_exit+0x231/0xab0 > [ 15.560079] do_group_exit+0x3f/0xc0 > [ 15.560810] __x64_sys_exit_group+0x13/0x20 > [ 15.561656] do_syscall_64+0x58/0x2f0 > [ 15.562407] ? trace_hardirqs_off_thunk+0x1a/0x1c > [ 15.563360] entry_SYSCALL_64_after_hwframe+0x49/0xbe > [ 15.564471] RIP: 0033:0x7f2221696408 > [ 15.565264] RSP: 002b:7ffe5c544c48 EFLAGS: 0206 ORIG_RAX: > 00e7 > [ 15.566924] RAX: ffda RBX: RCX: > 7f2221696408 > [ 15.568485] RDX: RSI: 003c RDI: > > [ 15.570046] RBP: R08: 00e7 R09: > ffa0 > [ 15.571603] R10: 7ffe5c5449e0 R11: 0206 R12: > 0004 > [ 15.573160] R13: 7ffe5c544e30 R14: R15: > > [ 15.574720] Code: 7b ff 43 78 0f 88 a5 6f 14 00 31 f6 48 89 df e8 ad 33 fb > ff 48 89 df e8 55 94 ff ff 85 c0 0f 84 84 00 00 00 4c 8b a3 d8 04 00 00 <41> > ff 44 24 04 0f 88 7f 6f 14 00 48 8b 43 58 f6 c4 01 74 58 48 > [ 15.578679] RIP: llc_ui_release+0x3a/0xd0 RSP: c915bd70 > [ 15.579874] CR2: 0004 > [ 15.580553] ---[ end trace 0dd8fdc6b7182234 ]--- > So it's saying that something which got committed into Linus's tree after 4.17-rc3 has caused a NULL deref in sock_release->llc_ui_release+0x3a/0xd0
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Tue, 24 Apr 2018 12:33:01 -0400 (EDT) Mikulas Patockawrote: > > > On Tue, 24 Apr 2018, Michal Hocko wrote: > > > On Tue 24-04-18 11:30:40, Mikulas Patocka wrote: > > > > > > > > > On Tue, 24 Apr 2018, Michal Hocko wrote: > > > > > > > On Mon 23-04-18 20:25:15, Mikulas Patocka wrote: > > > > > > > > > Fixing __vmalloc code > > > > > is easy and it doesn't require cooperation with maintainers. > > > > > > > > But it is a hack against the intention of the scope api. > > > > > > It is not! > > > > This discussion simply doesn't make much sense it seems. The scope API > > is to document the scope of the reclaim recursion critical section. That > > certainly is not a utility function like vmalloc. > > That 15-line __vmalloc bugfix doesn't prevent you (or any other kernel > developer) from converting the code to the scope API. You make nonsensical > excuses. > Fun thread! Winding back to the original problem, I'd state it as - Caller uses kvmalloc() but passes the address into vmalloc-naive DMA API and - Caller uses kvmalloc() but passes the address into kfree() Yes? If so, then... Is there a way in which, in the kvmalloc-called-kmalloc path, we can tag the slab-allocated memory with a "this memory was allocated with kvmalloc()" flag? I *think* there's extra per-object storage available with suitable slab/slub debugging options? Perhaps we could steal one bit from the redzone, dunno. If so then we can a) set that flag in kvmalloc() if the kmalloc() call succeeded b) check for that flag in the DMA code, WARN if it is set. c) in kvfree(), clear that flag before calling kfree() d) in kfree(), check for that flag and go WARN() if set. So both potential bugs are detected all the time, dependent upon CONFIG_SLUB_DEBUG (and perhaps other slub config options).
Re: simplify procfs code for seq_file instances
On Tue, 24 Apr 2018 16:23:04 +0200 Christoph Hellwigwrote: > On Thu, Apr 19, 2018 at 09:57:50PM +0300, Alexey Dobriyan wrote: > > > git://git.infradead.org/users/hch/misc.git proc_create > > > > > > I want to ask if it is time to start using poorman function overloading > > with _b_c_e(). There are millions of allocation functions for example, > > all slightly difference, and people will add more. Seeing /proc interfaces > > doubled like this is painful. > > Function overloading is totally unacceptable. > > And I very much disagree with a tradeoff that keeps 5000 lines of > code vs a few new helpers. OK, the curiosity and suspense are killing me. What the heck is "function overloading with _b_c_e()"?
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Thu, 19 Apr 2018 17:19:20 -0400 (EDT) Mikulas Patockawrote: > > > In order to detect these bugs reliably I submit this patch that changes > > > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on. > > > > > > ... > > > > > > --- linux-2.6.orig/mm/util.c 2018-04-18 15:46:23.0 +0200 > > > +++ linux-2.6/mm/util.c 2018-04-18 16:00:43.0 +0200 > > > @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap); > > > */ > > > void *kvmalloc_node(size_t size, gfp_t flags, int node) > > > { > > > +#ifndef CONFIG_DEBUG_VM > > > gfp_t kmalloc_flags = flags; > > > void *ret; > > > > > > @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f > > >*/ > > > if (ret || size <= PAGE_SIZE) > > > return ret; > > > +#endif > > > > > > return __vmalloc_node_flags_caller(size, node, flags, > > > __builtin_return_address(0)); > > > > Well, it doesn't have to be done at compile-time, does it? We could > > add a knob (in debugfs, presumably) which enables this at runtime. > > That's far more user-friendly. > > But who will turn it on in debugfs? But who will turn it on in Kconfig? Just a handful of developers. We could add SONFIG_DEBUG_SG to the list in Documentation/process/submit-checklist.rst, but nobody reads that. Also, a whole bunch of defconfigs set CONFIG_DEBUG_SG=y and some googling indicates that they aren't the only ones... > It should be default for debugging > kernels, so that users using them would report the error. Well. This isn't the first time we've wanted to enable expensive (or noisy) debugging things in debug kernels, by any means. So how could we define a debug kernel in which it's OK to enable such things? - Could be "it's an -rc kernel". But then we'd be enabling a bunch of untested code when Linus cuts a release. - Could be "it's an -rc kernel with SUBLEVEL <= 5". But then we risk unexpected things happening when Linux cuts -rc6, which still isn't good. - How about "it's an -rc kernel with odd-numbered SUBLEVEL and SUBLEVEL <= 5". That way everybody who runs -rc1, -rc3 and -rc5 will have kvmalloc debugging enabled. That's potentially nasty because vmalloc is much slower than kmalloc. But kvmalloc() is only used for large and probably infrequent allocations, so it's probably OK. I wonder how we get at SUBLEVEL from within .c.
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Thu, 19 Apr 2018 12:12:38 -0400 (EDT) Mikulas Patockawrote: > The kvmalloc function tries to use kmalloc and falls back to vmalloc if > kmalloc fails. > > Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then > uses DMA-API on the returned memory or frees it with kfree. Such bugs were > found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific > code. > > These bugs are hard to reproduce because vmalloc falls back to kmalloc > only if memory is fragmented. Yes, that's nasty. > In order to detect these bugs reliably I submit this patch that changes > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on. > > ... > > --- linux-2.6.orig/mm/util.c 2018-04-18 15:46:23.0 +0200 > +++ linux-2.6/mm/util.c 2018-04-18 16:00:43.0 +0200 > @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap); > */ > void *kvmalloc_node(size_t size, gfp_t flags, int node) > { > +#ifndef CONFIG_DEBUG_VM > gfp_t kmalloc_flags = flags; > void *ret; > > @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f >*/ > if (ret || size <= PAGE_SIZE) > return ret; > +#endif > > return __vmalloc_node_flags_caller(size, node, flags, > __builtin_return_address(0)); Well, it doesn't have to be done at compile-time, does it? We could add a knob (in debugfs, presumably) which enables this at runtime. That's far more user-friendly.
Re: [PATCH v3] kernel.h: Skip single-eval logic on literals in min()/max()
On Mon, 12 Mar 2018 21:28:57 -0700 Kees Cook <keesc...@chromium.org> wrote: > On Mon, Mar 12, 2018 at 4:57 PM, Linus Torvalds > <torva...@linux-foundation.org> wrote: > > On Mon, Mar 12, 2018 at 3:55 PM, Andrew Morton > > <a...@linux-foundation.org> wrote: > >> > >> Replacing the __builtin_choose_expr() with ?: works of course. > > > > Hmm. That sounds like the right thing to do. We were so myopically > > staring at the __builtin_choose_expr() problem that we overlooked the > > obvious solution. > > > > Using __builtin_constant_p() together with a ?: is in fact our common > > pattern, so that should be fine. The only real reason to use > > __builtin_choose_expr() is if you want to get the *type* to vary > > depending on which side you choose, but that's not an issue for > > min/max. > > This doesn't solve it for -Wvla, unfortunately. That was the point of > Josh's original suggestion of __builtin_choose_expr(). > > Try building with KCFLAGS=-Wval and checking net/ipv6/proc.c: > > net/ipv6/proc.c: In function ‘snmp6_seq_show_item’: > net/ipv6/proc.c:198:2: warning: ISO C90 forbids array ‘buff’ whose > size can’t be evaluated [-Wvla] > unsigned long buff[SNMP_MIB_MAX]; > ^~~~ PITA. Didn't we once have a different way of detecting VLAs? Some post-compilation asm parser, iirc. I suppose the world wouldn't end if we had a gcc version ifdef in kernel.h. We'll get to remove it in, oh, ten years.
Re: [PATCH v3] kernel.h: Skip single-eval logic on literals in min()/max()
On Fri, 9 Mar 2018 17:30:15 -0800 Kees Cookwrote: > > It's one reason why I wondered if simplifying the expression to have > > just that single __builtin_constant_p() might not end up working.. > > Yeah, it seems like it doesn't bail out as "false" for complex > expressions given to __builtin_constant_p(). > > If no magic solution, then which of these? > > - go back to my original "multi-eval max only for constants" macro (meh) > - add gcc version checks around this and similarly for -Wvla in the future > (eww) > - raise gcc version (yikes) Replacing the __builtin_choose_expr() with ?: works of course. What will be the runtime effects? I tried replacing __builtin_choose_expr(__builtin_constant_p(x) && __builtin_constant_p(y), with __builtin_choose_expr(__builtin_constant_p(x + y), but no success. I'm not sure what else to try to trick gcc into working. --- a/include/linux/kernel.h~kernelh-skip-single-eval-logic-on-literals-in-min-max-v3-fix +++ a/include/linux/kernel.h @@ -804,13 +804,10 @@ static inline void ftrace_dump(enum ftra * accidental VLA. */ #define __min(t1, t2, x, y)\ - __builtin_choose_expr(__builtin_constant_p(x) &&\ - __builtin_constant_p(y), \ - (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\ - __single_eval_min(t1, t2, \ - __UNIQUE_ID(min1_), \ - __UNIQUE_ID(min2_), \ - x, y)) + ((__builtin_constant_p(x) && __builtin_constant_p(y)) ? \ + ((t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y)) : \ + (__single_eval_min(t1, t2, __UNIQUE_ID(min1_), \ + __UNIQUE_ID(min2_), x, y))) /** * min - return minimum of two values of the same or compatible types @@ -826,13 +823,11 @@ static inline void ftrace_dump(enum ftra max1 > max2 ? max1 : max2; }) #define __max(t1, t2, x, y)\ - __builtin_choose_expr(__builtin_constant_p(x) &&\ - __builtin_constant_p(y), \ - (t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y),\ - __single_eval_max(t1, t2, \ - __UNIQUE_ID(max1_), \ - __UNIQUE_ID(max2_), \ - x, y)) + ((__builtin_constant_p(x) && __builtin_constant_p(y)) ? \ + ((t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y)) : \ + (__single_eval_max(t1, t2, __UNIQUE_ID(max1_), \ + __UNIQUE_ID(max2_), x, y))) + /** * max - return maximum of two values of the same or compatible types * @x: first value _
Re: [PATCH v3] kernel.h: Skip single-eval logic on literals in min()/max()
On Fri, 9 Mar 2018 16:28:51 -0800 Linus Torvalds <torva...@linux-foundation.org> wrote: > On Fri, Mar 9, 2018 at 4:07 PM, Andrew Morton <a...@linux-foundation.org> > wrote: > > > > A brief poke failed to reveal a workaround - gcc-4.4.4 doesn't appear > > to know that __builtin_constant_p(x) is a constant. Or something. > > LOL. > > I suspect it might be that it wants to evaluate > __builtin_choose_expr() at an earlier stage than it evaluates > __builtin_constant_p(), so it's not that it doesn't know that > __builtin_constant_p() is a constant, it just might not know it *yet*. > > Maybe. > > Side note, if it's not that, but just the "complex" expression that > has the logical 'and' etc, maybe the code could just use > > __builtin_constant_p((x)+(y)) > > or something. I'll do a bit more poking at it. > But yeah: > > > Sigh. Wasn't there some talk about modernizing our toolchain > > requirements? > > Maybe it's just time to give up on 4.4. We wanted 4.5 for "asm goto", > and once we upgrade to 4.5 I think Arnd said that no distro actually > ships it, so we might as well go to 4.6. > > So maybe this is just the excuse to finally make that official, if > there is no clever workaround any more. I wonder which gcc versions actually accept Kees's addition.
Re: [PATCH v3] kernel.h: Skip single-eval logic on literals in min()/max()
On Fri, 9 Mar 2018 12:05:36 -0800 Kees Cookwrote: > When max() is used in stack array size calculations from literal values > (e.g. "char foo[max(sizeof(struct1), sizeof(struct2))]", the compiler > thinks this is a dynamic calculation due to the single-eval logic, which > is not needed in the literal case. This change removes several accidental > stack VLAs from an x86 allmodconfig build: > > $ diff -u before.txt after.txt | grep ^- > -drivers/input/touchscreen/cyttsp4_core.c:871:2: warning: ISO C90 forbids > variable length array ‘ids’ [-Wvla] > -fs/btrfs/tree-checker.c:344:4: warning: ISO C90 forbids variable length > array ‘namebuf’ [-Wvla] > -lib/vsprintf.c:747:2: warning: ISO C90 forbids variable length array ‘sym’ > [-Wvla] > -net/ipv4/proc.c:403:2: warning: ISO C90 forbids variable length array ‘buff’ > [-Wvla] > -net/ipv6/proc.c:198:2: warning: ISO C90 forbids variable length array ‘buff’ > [-Wvla] > -net/ipv6/proc.c:218:2: warning: ISO C90 forbids variable length array > ‘buff64’ [-Wvla] > > Based on an earlier patch from Josh Poimboeuf. v1, v2 and v3 of this patch all fail with gcc-4.4.4: ./include/linux/jiffies.h: In function 'jiffies_delta_to_clock_t': ./include/linux/jiffies.h:444: error: first argument to '__builtin_choose_expr' not a constant That's with #define __max(t1, t2, x, y) \ __builtin_choose_expr(__builtin_constant_p(x) &&\ __builtin_constant_p(y) &&\ __builtin_types_compatible_p(t1, t2), \ (t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y),\ __single_eval_max(t1, t2, \ __UNIQUE_ID(max1_), \ __UNIQUE_ID(max2_), \ x, y)) /** * max - return maximum of two values of the same or compatible types * @x: first value * @y: second value */ #define max(x, y) __max(typeof(x), typeof(y), x, y) A brief poke failed to reveal a workaround - gcc-4.4.4 doesn't appear to know that __builtin_constant_p(x) is a constant. Or something. Sigh. Wasn't there some talk about modernizing our toolchain requirements?
Re: [PATCH] net/9p: avoid -ERESTARTSYS leak to userspace
On Fri, 09 Mar 2018 21:41:38 +0100 Greg Kurzwrote: > If it was interrupted by a signal, the 9p client may need to send some > more requests to the server for cleanup before returning to userspace. > > To avoid such a last minute request to be interrupted right away, the > client memorizes if a signal is pending, clears TIF_SIGPENDING, handles > the request and calls recalc_sigpending() before returning. > > Unfortunately, if the transmission of this cleanup request fails for any > reason, the transport returns an error and the client propagates it right > away, without calling recalc_sigpending(). > > This ends up with -ERESTARTSYS from the initially interrupted request > crawling up to syscall exit, with TIF_SIGPENDING cleared by the cleanup > request. The specific signal handling code, which is responsible for > converting -ERESTARTSYS to -EINTR is not called, and userspace receives > the confusing errno value: > > open: Unknown error 512 (512) > > This is really hard to hit in real life. I discovered the issue while > working on hot-unplug of a virtio-9p-pci device with an instrumented > QEMU allowing to control request completion. > > Both p9_client_zc_rpc() and p9_client_rpc() functions have this buggy > error path actually. Their code flow is a bit obscure and the best > thing to do would probably be a full rewrite: to really ensure this > situation of clearing TIF_SIGPENDING and returning -ERESTARTSYS can > never happen. > > But given the general lack of interest for the 9p code, I won't risk > breaking more things. So this patch simply fixes the buggy paths in > both functions with a trivial label+goto. > > Thanks to Laurent Dufour for his help and suggestions on how to find > the root cause and how to fix it. That's a fairly straightforward-looking bug. However the code still looks a bit racy: : if (signal_pending(current)) { : sigpending = 1; : clear_thread_flag(TIF_SIGPENDING); : } else : sigpending = 0; : what happens if signal_pending(current) becomes true right here? : err = c->trans_mod->request(c, req); I'm surprised that the 9p client is mucking with signals at all. Signals are a userspace IPC scheme and kernel code should instead be using the more powerful messaging mechanisms which we've developed. Ones which don't disrupt userspace state. Why is this happening? Is there some userspace/kernel interoperation involved?
Re: [PATCH] kernel.h: Skip single-eval logic on literals in min()/max()
On Thu, 8 Mar 2018 13:40:45 -0800 Kees Cookwrote: > When max() is used in stack array size calculations from literal values > (e.g. "char foo[max(sizeof(struct1), sizeof(struct2))]", the compiler > thinks this is a dynamic calculation due to the single-eval logic, which > is not needed in the literal case. This change removes several accidental > stack VLAs from an x86 allmodconfig build: > > $ diff -u before.txt after.txt | grep ^- > -drivers/input/touchscreen/cyttsp4_core.c:871:2: warning: ISO C90 forbids > variable length array ‘ids’ [-Wvla] > -fs/btrfs/tree-checker.c:344:4: warning: ISO C90 forbids variable length > array ‘namebuf’ [-Wvla] > -lib/vsprintf.c:747:2: warning: ISO C90 forbids variable length array ‘sym’ > [-Wvla] > -net/ipv4/proc.c:403:2: warning: ISO C90 forbids variable length array ‘buff’ > [-Wvla] > -net/ipv6/proc.c:198:2: warning: ISO C90 forbids variable length array ‘buff’ > [-Wvla] > -net/ipv6/proc.c:218:2: warning: ISO C90 forbids variable length array > ‘buff64’ [-Wvla] > > Based on an earlier patch from Josh Poimboeuf. > > ... > > --- a/include/linux/kernel.h > +++ b/include/linux/kernel.h > @@ -787,37 +787,57 @@ static inline void ftrace_dump(enum ftrace_dump_mode > oops_dump_mode) { } > * strict type-checking.. See the > * "unnecessary" pointer comparison. > */ > -#define __min(t1, t2, min1, min2, x, y) ({ \ > +#define __single_eval_min(t1, t2, min1, min2, x, y) ({ \ > t1 min1 = (x); \ > t2 min2 = (y); \ > (void) ( == );\ > min1 < min2 ? min1 : min2; }) > > +/* > + * In the case of builtin constant values, there is no need to do the > + * double-evaluation protection, so the raw comparison can be made. > + * This allows min()/max() to be used in stack array allocations and > + * avoid the compiler thinking it is a dynamic value leading to an > + * accidental VLA. > + */ > +#define __min(t1, t2, x, y) \ > + __builtin_choose_expr(__builtin_constant_p(x) &&\ > + __builtin_constant_p(y) &&\ > + __builtin_types_compatible_p(t1, t2), \ > + (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\ > + __single_eval_min(t1, t2, \ > + __UNIQUE_ID(max1_), \ > + __UNIQUE_ID(max2_), \ > + x, y)) > + Holy crap. I suppose gcc will one day be fixed and we won't need this. Is there a good reason to convert min()? Surely nobody will be using min to dimension an array - always max? Just for symmetry, I guess.
Re: [PATCH 0/3] Remove accidental VLA usage
On Thu, 8 Mar 2018 09:02:36 -0600 Josh Poimboeufwrote: > On Wed, Mar 07, 2018 at 07:30:44PM -0800, Kees Cook wrote: > > This series adds SIMPLE_MAX() to be used in places where a stack array > > is actually fixed, but the compiler still warns about VLA usage due to > > confusion caused by the safety checks in the max() macro. > > > > I'm sending these via -mm since that's where I've introduced SIMPLE_MAX(), > > and they should all have no operational differences. > > What if we instead simplify the max() macro's type checking so that GCC > can more easily fold the array size constants? The below patch seems to > work: > > -/* > - * min()/max()/clamp() macros that also do > - * strict type-checking.. See the > - * "unnecessary" pointer comparison. > - */ > -#define __min(t1, t2, min1, min2, x, y) ({ \ > - t1 min1 = (x); \ > - t2 min2 = (y); \ > - (void) ( == );\ > - min1 < min2 ? min1 : min2; }) > +extern long __error_incompatible_types_in_min_macro; > +extern long __error_incompatible_types_in_max_macro; > + > +#define __min(t1, t2, x, y) \ > + __builtin_choose_expr(__builtin_types_compatible_p(t1, t2), \ > + (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\ > + (t1)__error_incompatible_types_in_min_macro) This will move the error detection from compile-time to link-time. That's tolerable I guess, but a bit sad and should be flagged in the changelog at least.
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Wed, 7 Feb 2018 18:44:39 +0100 Pablo Neira Ayuso <pa...@netfilter.org> wrote: > Hi, > > On Wed, Jan 31, 2018 at 09:19:16AM +0100, Michal Hocko wrote: > [...] > > Yeah, we do not BUG but rather fail instead. See __vmalloc_node_range. > > My excavation tools pointed me to "VM: Rework vmalloc code to support > > mapping of arbitray pages" > > by Christoph back in 2002. So yes, we can safely remove it finally. Se > > below. > > > > > > From 8d52e1d939d101b0dafed6ae5c3c1376183e65bb Mon Sep 17 00:00:00 2001 > > From: Michal Hocko <mho...@suse.com> > > Date: Wed, 31 Jan 2018 09:16:56 +0100 > > Subject: [PATCH] net/netfilter/x_tables.c: remove size check > > > > Back in 2002 vmalloc used to BUG on too large sizes. We are much better > > behaved these days and vmalloc simply returns NULL for those. Remove > > the check as it simply not needed and the comment even misleading. > > > > Suggested-by: Andrew Morton <a...@linux-foundation.org> > > Signed-off-by: Michal Hocko <mho...@suse.com> > > --- > > net/netfilter/x_tables.c | 4 > > 1 file changed, 4 deletions(-) > > > > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c > > index b55ec5aa51a6..48a6ff620493 100644 > > --- a/net/netfilter/x_tables.c > > +++ b/net/netfilter/x_tables.c > > @@ -999,10 +999,6 @@ struct xt_table_info *xt_alloc_table_info(unsigned int > > size) > > if (sz < sizeof(*info)) > > return NULL; > > > > - /* Pedantry: prevent them from hitting BUG() in vmalloc.c --RR */ > > - if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages) > > - return NULL; > > - > > /* __GFP_NORETRY is not fully supported by kvmalloc but it should > > * work reasonably well if sz is too large and bail out rather > > * than shoot all processes down before realizing there is nothing > > Patchwork didn't catch this patch for some reason, would you mind to > resend? From: Michal Hocko <mho...@suse.com> Subject: net/netfilter/x_tables.c: remove size check Back in 2002 vmalloc used to BUG on too large sizes. We are much better behaved these days and vmalloc simply returns NULL for those. Remove the check as it simply not needed and the comment is even misleading. Link: http://lkml.kernel.org/r/20180131081916.go21...@dhcp22.suse.cz Suggested-by: Andrew Morton <a...@linux-foundation.org> Signed-off-by: Michal Hocko <mho...@suse.com> Reviewed-by: Andrew Morton <a...@linux-foundation.org> Cc: Florian Westphal <f...@strlen.de> Cc: David S. Miller <da...@davemloft.net> Signed-off-by: Andrew Morton <a...@linux-foundation.org> --- net/netfilter/x_tables.c |4 1 file changed, 4 deletions(-) diff -puN net/netfilter/x_tables.c~net-netfilter-x_tablesc-remove-size-check net/netfilter/x_tables.c --- a/net/netfilter/x_tables.c~net-netfilter-x_tablesc-remove-size-check +++ a/net/netfilter/x_tables.c @@ -1004,10 +1004,6 @@ struct xt_table_info *xt_alloc_table_inf if (sz < sizeof(*info)) return NULL; - /* Pedantry: prevent them from hitting BUG() in vmalloc.c --RR */ - if ((size >> PAGE_SHIFT) + 2 > totalram_pages) - return NULL; - /* __GFP_NORETRY is not fully supported by kvmalloc but it should * work reasonably well if sz is too large and bail out rather * than shoot all processes down before realizing there is nothing _
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Tue, 30 Jan 2018 15:01:04 +0100 Michal Hockowrote: > > Well, this is not about syzkaller, it merely pointed out a potential > > DoS... And that has to be addressed somehow. > > So how about this? > --- argh ;) > >From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Tue, 30 Jan 2018 14:51:07 +0100 > Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive > > syzbot has noticed that xt_alloc_table_info can allocate a lot of > memory. This is an admin only interface but an admin in a namespace > is sufficient as well. eacd86ca3b03 ("net/netfilter/x_tables.c: use > kvmalloc() in xt_alloc_table_info()") has changed the opencoded > kmalloc->vmalloc fallback into kvmalloc. It has dropped __GFP_NORETRY on > the way because vmalloc has simply never fully supported __GFP_NORETRY > semantic. This is still the case because e.g. page tables backing the > vmalloc area are hardcoded GFP_KERNEL. > > Revert back to __GFP_NORETRY as a poors man defence against excessively > large allocation request here. We will not rule out the OOM killer > completely but __GFP_NORETRY should at least stop the large request > in most cases. > > Fixes: eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in > xt_alloc_table_info()") > Signed-off-by: Michal Hocko > --- > net/netfilter/x_tables.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c > index d8571f414208..a5f5c29bcbdc 100644 > --- a/net/netfilter/x_tables.c > +++ b/net/netfilter/x_tables.c > @@ -1003,7 +1003,13 @@ struct xt_table_info *xt_alloc_table_info(unsigned int > size) > if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages) > return NULL; offtopic: preceding comment here is "prevent them from hitting BUG() in vmalloc.c". I suspect this is ancient code and vmalloc sure as heck shouldn't go BUG with this input. And it should be using `sz' ;) So I suspect and hope that this code can be removed. If not, let's fix vmalloc! > - info = kvmalloc(sz, GFP_KERNEL); > + /* > + * __GFP_NORETRY is not fully supported by kvmalloc but it should > + * work reasonably well if sz is too large and bail out rather > + * than shoot all processes down before realizing there is nothing > + * more to reclaim. > + */ > + info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY); > if (!info) > return NULL; checkpatch sayeth networking block comments don't use an empty /* line, use /* Comment... So I'll do that and shall scoot the patch Davewards.
Re: [PATCH V11 4/5] vsprintf: add printk specifier %px
On Wed, 29 Nov 2017 13:05:04 +1100 "Tobin C. Harding"wrote: > printk specifier %p now hashes all addresses before printing. Sometimes > we need to see the actual unmodified address. This can be achieved using > %lx but then we face the risk that if in future we want to change the > way the Kernel handles printing of pointers we will have to grep through > the already existent 50 000 %lx call sites. Let's add specifier %px as a > clear, opt-in, way to print a pointer and maintain some level of > isolation from all the other hex integer output within the Kernel. > > Add printk specifier %px to print the actual unmodified address. > > ... > > +Unmodified Addresses > + > + > +:: > + > + %px 01234567 or 0123456789abcdef > + > +For printing pointers when you _really_ want to print the address. Please > +consider whether or not you are leaking sensitive information about the > +Kernel layout in memory before printing pointers with %px. %px is > +functionally equivalent to %lx. %px is preferred to %lx because it is more > +uniquely grep'able. If, in the future, we need to modify the way the Kernel > +handles printing pointers it will be nice to be able to find the call > +sites. > + You might want to add a checkpatch rule which emits a stern do-you-really-want-to-do-this warning when someone uses %px.
Re: [PATCH V11 3/5] printk: hash addresses printed with %p
On Wed, 29 Nov 2017 13:05:03 +1100 "Tobin C. Harding"wrote: > Currently there exist approximately 14 000 places in the kernel where > addresses are being printed using an unadorned %p. This potentially > leaks sensitive information regarding the Kernel layout in memory. Many > of these calls are stale, instead of fixing every call lets hash the > address by default before printing. This will of course break some > users, forcing code printing needed addresses to be updated. > > Code that _really_ needs the address will soon be able to use the new > printk specifier %px to print the address. > > For what it's worth, usage of unadorned %p can be broken down as > follows (thanks to Joe Perches). > > $ git grep -E '%p[^A-Za-z0-9]' | cut -f1 -d"/" | sort | uniq -c >1084 arch > 20 block > 10 crypto > 32 Documentation >8121 drivers >1221 fs > 143 include > 101 kernel > 69 lib > 100 mm >1510 net > 40 samples > 7 scripts > 11 security > 166 sound > 152 tools > 2 virt > > Add function ptr_to_id() to map an address to a 32 bit unique > identifier. Hash any unadorned usage of specifier %p and any malformed > specifiers. > > ... > > @@ -1644,6 +1646,73 @@ char *device_node_string(char *buf, char *end, struct > device_node *dn, > return widen_string(buf, buf - buf_start, end, spec); > } > > +static bool have_filled_random_ptr_key __read_mostly; > +static siphash_key_t ptr_key __read_mostly; > + > +static void fill_random_ptr_key(struct random_ready_callback *unused) > +{ > + get_random_bytes(_key, sizeof(ptr_key)); > + /* > + * have_filled_random_ptr_key==true is dependent on get_random_bytes(). > + * ptr_to_id() needs to see have_filled_random_ptr_key==true > + * after get_random_bytes() returns. > + */ > + smp_mb(); > + WRITE_ONCE(have_filled_random_ptr_key, true); > +} I don't think I'm seeing anything which prevents two CPUs from initializing ptr_key at the same time. Probably doesn't matter much...
Re: [PATCH V11 0/5] hash addresses printed with %p
On Wed, 29 Nov 2017 13:05:00 +1100 "Tobin C. Harding"wrote: > Currently there exist approximately 14 000 places in the Kernel where > addresses are being printed using an unadorned %p. This potentially > leaks sensitive information regarding the Kernel layout in memory. Many > of these calls are stale, instead of fixing every call lets hash the > address by default before printing. This will of course break some > users, forcing code printing needed addresses to be updated. We can add > a printk specifier for this purpose (%px) to give developers a clear > upgrade path for breakages caused by applying this patch set. > > The added advantage of hashing %p is that security is now opt-out, if > you _really_ want the address you have to work a little harder and use > %px. > > The idea for creating the printk specifier %px to print the actual > address was suggested by Kees Cook (see below for email threads by > subject). Maybe I'm being thick, but... if we're rendering these addresses unusable by hashing them, why not just print something like "" in their place? That loses the uniqueness thing but I wonder how valuable that is in practice?
Re: [PATCH] mm, page_alloc: re-enable softirq use of per-cpu page allocator
On Mon, 10 Apr 2017 16:08:21 +0100 Mel Gormanwrote: > IRQ context were excluded from using the Per-Cpu-Pages (PCP) lists caching > of order-0 pages in commit 374ad05ab64d ("mm, page_alloc: only use per-cpu > allocator for irq-safe requests"). > > This unfortunately also included excluded SoftIRQ. This hurt the performance > for the use-case of refilling DMA RX rings in softirq context. Out of curiosity: by how much did it "hurt"? Tariq found: : I disabled the page-cache (recycle) mechanism to stress the page : allocator, and see a drastic degradation in BW, from 47.5 G in v4.10 to : 31.4 G in v4.11-rc1 (34% drop). then with this patch he found : It looks very good! I get line-rate (94Gbits/sec) with 8 streams, in : comparison to less than 55Gbits/sec before. Can I take this to mean that the page allocator's per-cpu-pages feature ended up doubling the performance of this driver? Better than the driver's private page recycling? I'd like to believe that, but am having trouble doing so ;) > This patch re-allow softirq context, which should be safe by disabling > BH/softirq, while accessing the list. PCP-lists access from both hard-IRQ > and NMI context must not be allowed. Peter Zijlstra says in_nmi() code > never access the page allocator, thus it should be sufficient to only test > for !in_irq(). > > One concern with this change is adding a BH (enable) scheduling point at > both PCP alloc and free. If further concerns are highlighted by this patch, > the result wiill be to revert 374ad05ab64d and try again at a later date > to offset the irq enable/disable overhead.
Re: [mm PATCH 0/3] Page fragment updates
On Mon, 5 Dec 2016 09:01:12 -0800 Alexander Duyckwrote: > On Tue, Nov 29, 2016 at 10:23 AM, Alexander Duyck > wrote: > > This patch series takes care of a few cleanups for the page fragments API. > > > > ... > > It's been about a week since I submitted this series. Just wanted to > check in and see if anyone had any feedback or if this is good to be > accepted for 4.10-rc1 with the rest of the set? Looks good to me. I have it all queued for post-4.9 processing.
Re: [mm PATCH v3 21/23] mm: Add support for releasing multiple instances of a page
On Mon, 21 Nov 2016 08:21:39 -0800 Alexander Duyckwrote: > >> + __free_pages_ok(page, order); > >> + } > >> +} > >> +EXPORT_SYMBOL(__page_frag_drain); > > > > It's an exported-to-modules library function. It should be documented, > > please? The page-frag API is only partially documented, but that's no > > excuse. > > Okay. I assume you want the documentation as a follow-up patch since > I received a notice that the patch was added to -mm? Yes please. Or a replacement patch which I'll temporarily turn into a delta, either is fine. > If you would like I could look at doing a couple of renaming patches > so that we make the API a bit more consistent. I could move the > __alloc and __free to what you have suggested, and then take a look at > trying to rename the refill/drain to be a bit more consistent in terms > of what they are supposed to work on and how they are supposed to be > used. I think that would be better - it's hardly high-priority but a bit of attention to the documentation and naming conventions would help tidy things up. When you can't find anything else to do ;)
Re: [mm PATCH v3 21/23] mm: Add support for releasing multiple instances of a page
On Thu, 10 Nov 2016 06:36:06 -0500 Alexander Duyckwrote: > This patch adds a function that allows us to batch free a page that has > multiple references outstanding. Specifically this function can be used to > drop a page being used in the page frag alloc cache. With this drivers can > make use of functionality similar to the page frag alloc cache without > having to do any workarounds for the fact that there is no function that > frees multiple references. > > ... > > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -506,6 +506,8 @@ extern void free_hot_cold_page(struct page *page, bool > cold); > extern void free_hot_cold_page_list(struct list_head *list, bool cold); > > struct page_frag_cache; > +extern void __page_frag_drain(struct page *page, unsigned int order, > + unsigned int count); > extern void *__alloc_page_frag(struct page_frag_cache *nc, > unsigned int fragsz, gfp_t gfp_mask); > extern void __free_page_frag(void *addr); > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 0fbfead..54fea40 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3912,6 +3912,20 @@ static struct page *__page_frag_refill(struct > page_frag_cache *nc, > return page; > } > > +void __page_frag_drain(struct page *page, unsigned int order, > +unsigned int count) > +{ > + VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); > + > + if (page_ref_sub_and_test(page, count)) { > + if (order == 0) > + free_hot_cold_page(page, false); > + else > + __free_pages_ok(page, order); > + } > +} > +EXPORT_SYMBOL(__page_frag_drain); It's an exported-to-modules library function. It should be documented, please? The page-frag API is only partially documented, but that's no excuse. And perhaps documentation will help explain the naming choice. Why "drain"? I'd have expected "put"? And why the leading underscores. The page-frag API is pretty weird :( And inconsistent. __alloc_page_frag -> page_frag_alloc, __free_page_frag -> page_frag_free(), etc. I must have been asleep when I let that lot through.
Re: [PATCH v2] fs/select: add vmalloc fallback for select(2)
On Thu, 22 Sep 2016 18:43:59 +0200 Vlastimil Babkawrote: > The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows > with the number of fds passed. We had a customer report page allocation > failures of order-4 for this allocation. This is a costly order, so it might > easily fail, as the VM expects such allocation to have a lower-order fallback. > > Such trivial fallback is vmalloc(), as the memory doesn't have to be > physically contiguous. Also the allocation is temporary for the duration of > the > syscall, so it's unlikely to stress vmalloc too much. > > Note that the poll(2) syscall seems to use a linked list of order-0 pages, so > it doesn't need this kind of fallback. > > ... > > --- a/fs/select.c > +++ b/fs/select.c > @@ -29,6 +29,7 @@ > #include > #include > #include > +#include > > #include > > @@ -558,6 +559,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set > __user *outp, > struct fdtable *fdt; > /* Allocate small arguments on the stack to save memory and be faster */ > long stack_fds[SELECT_STACK_ALLOC/sizeof(long)]; > + unsigned long alloc_size; > > ret = -EINVAL; > if (n < 0) > @@ -580,8 +582,12 @@ int core_sys_select(int n, fd_set __user *inp, fd_set > __user *outp, > bits = stack_fds; > if (size > sizeof(stack_fds) / 6) { > /* Not enough space in on-stack array; must use kmalloc */ > + alloc_size = 6 * size; Well. `size' is `unsigned'. The multiplication will be done as 32-bit so there was no point in making `alloc_size' unsigned long. So can we tighten up the types in this function? size_t might make sense, but vmalloc() takes a ulong. > ret = -ENOMEM; > - bits = kmalloc(6 * size, GFP_KERNEL); > + bits = kmalloc(alloc_size, GFP_KERNEL|__GFP_NOWARN); > + if (!bits && alloc_size > PAGE_SIZE) > + bits = vmalloc(alloc_size); I don't share Eric's concerns about performance here. If the vmalloc() is called, we're about to write to that quite large amount of memory which we just allocated, and the vmalloc() overhead will be relatively low. > if (!bits) > goto out_nofds; > } > @@ -618,7 +624,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set > __user *outp, > > out: > if (bits != stack_fds) > - kfree(bits); > + kvfree(bits); > out_nofds: > return ret; It otherwise looks OK to me.
Re: [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking
On Thu, 15 Sep 2016 13:34:24 +0800 kbuild test robotwrote: > Hi Johannes, > > [auto build test ERROR on net/master] > [also build test ERROR on v4.8-rc6 next-20160914] > [if your patch is applied to the wrong git tree, please drop us a note to > help improve the system] > [Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto for > convenience) to record what (public, well-known) commit your patch series was > built on] > [Check https://git-scm.com/docs/git-format-patch for more information] > > url: > https://github.com/0day-ci/linux/commits/Johannes-Weiner/mm-memcontrol-make-per-cpu-charge-cache-IRQ-safe-for-socket-accounting/20160915-035634 > config: m68k-sun3_defconfig (attached as .config) > compiler: m68k-linux-gcc (GCC) 4.9.0 > reproduce: > wget > https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross > -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # save the attached .config to linux build tree > make.cross ARCH=m68k > > All errors (new ones prefixed by >>): > >net/built-in.o: In function `sk_alloc': > >> (.text+0x4076): undefined reference to `mem_cgroup_sk_alloc' >net/built-in.o: In function `__sk_destruct': > >> sock.c:(.text+0x457e): undefined reference to `mem_cgroup_sk_free' >net/built-in.o: In function `sk_clone_lock': >(.text+0x4f1c): undefined reference to `mem_cgroup_sk_alloc' This? --- a/mm/memcontrol.c~mm-memcontrol-consolidate-cgroup-socket-tracking-fix +++ a/mm/memcontrol.c @@ -5655,9 +5655,6 @@ void mem_cgroup_sk_alloc(struct sock *sk { struct mem_cgroup *memcg; - if (!mem_cgroup_sockets_enabled) - return; - /* * Socket cloning can throw us here with sk_memcg already * filled. It won't however, necessarily happen from --- a/net/core/sock.c~mm-memcontrol-consolidate-cgroup-socket-tracking-fix +++ a/net/core/sock.c @@ -1385,7 +1385,8 @@ static void sk_prot_free(struct proto *p slab = prot->slab; cgroup_sk_free(>sk_cgrp_data); - mem_cgroup_sk_free(sk); + if (mem_cgroup_sockets_enabled) + mem_cgroup_sk_free(sk); security_sk_free(sk); if (slab != NULL) kmem_cache_free(slab, sk); @@ -1422,7 +1423,8 @@ struct sock *sk_alloc(struct net *net, i sock_net_set(sk, net); atomic_set(>sk_wmem_alloc, 1); - mem_cgroup_sk_alloc(sk); + if (mem_cgroup_sockets_enabled) + mem_cgroup_sk_alloc(sk); cgroup_sk_alloc(>sk_cgrp_data); sock_update_classid(>sk_cgrp_data); sock_update_netprioidx(>sk_cgrp_data); @@ -1569,7 +1571,8 @@ struct sock *sk_clone_lock(const struct newsk->sk_incoming_cpu = raw_smp_processor_id(); atomic64_set(>sk_cookie, 0); - mem_cgroup_sk_alloc(newsk); + if (mem_cgroup_sockets_enabled) + mem_cgroup_sk_alloc(newsk); cgroup_sk_alloc(>sk_cgrp_data); /* _
Re: [PATCH] remove lots of IS_ERR_VALUE abuses
On Fri, 27 May 2016 23:23:25 +0200 Arnd Bergmannwrote: > Most users of IS_ERR_VALUE() in the kernel are wrong, as they > pass an 'int' into a function that takes an 'unsigned long' > argument. This happens to work because the type is sign-extended > on 64-bit architectures before it gets converted into an > unsigned type. > > However, anything that passes an 'unsigned short' or 'unsigned int' > argument into IS_ERR_VALUE() is guaranteed to be broken, as are > 8-bit integers and types that are wider than 'unsigned long'. > > Andrzej Hajda has already fixed a lot of the worst abusers that > were causing actual bugs, but it would be nice to prevent any > users that are not passing 'unsigned long' arguments. > > This patch changes all users of IS_ERR_VALUE() that I could find > on 32-bit ARM randconfig builds and x86 allmodconfig. For the > moment, this doesn't change the definition of IS_ERR_VALUE() > because there are probably still architecture specific users > elsewhere. So you do plan to add some sort of typechecking into IS_ERR_VALUE()? > Almost all the warnings I got are for files that are better off > using 'if (err)' or 'if (err < 0)'. > The only legitimate user I could find that we get a warning for > is the (32-bit only) freescale fman driver, so I did not remove > the IS_ERR_VALUE() there but changed the type to 'unsigned long'. > For 9pfs, I just worked around one user whose calling conventions > are so obscure that I did not dare change the behavior. > > I was using this definition for testing: > > #define IS_ERR_VALUE(x) ((unsigned long*)NULL == (typeof (x)*)NULL && \ >unlikely((unsigned long long)(x) >= (unsigned long > long)(typeof(x))-MAX_ERRNO)) > > which ends up making all 16-bit or wider types work correctly with > the most plausible interpretation of what IS_ERR_VALUE() was supposed > to return according to its users, but also causes a compile-time > warning for any users that do not pass an 'unsigned long' argument. > > I suggested this approach earlier this year, but back then we ended > up deciding to just fix the users that are obviously broken. After > the initial warning that caused me to get involved in the discussion > (fs/gfs2/dir.c) showed up again in the mainline kernel, Linus > asked me to send the whole thing again.
Re: [Y2038] [RESEND PATCH 2/3] fs: poll/select/recvmmsg: use timespec64 for timeout events
On Wed, 04 May 2016 23:08:11 +0200 Arnd Bergmannwrote: > > But I'm less comfortable making the call on this one. It looks > > relatively straight forward, but it would be good to have maintainer > > acks before I add it to my tree. > > Agreed. Feel free to add my > > Reviewed-by: Arnd Bergmann > > at least (whoever picks it up). In reply to [1/3] John said : Looks ok at the first glance. I've queued these up for testing, : however I only got #1 and #3 of the set. Are you hoping these two : patches will go through tip/timers/core or are you looking for acks so : they can go via another tree? However none of the patches are in linux-next. John had qualms about [2/3], but it looks like a straightforward substitution in areas which will get plenty of testing I'll grab the patches for now to get them some external testing.
Re: [Bug 117521] New: BUG: unable to handle kernel paging request at 000001a400015ff4
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). Thanks. It's probably a TIPC issue. On Mon, 02 May 2016 16:44:18 + bugzilla-dae...@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=117521 > > Bug ID: 117521 >Summary: BUG: unable to handle kernel paging request at > 01a400015ff4 >Product: Memory Management >Version: 2.5 > Kernel Version: 4.4.0 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: high > Priority: P1 > Component: Other > Assignee: a...@linux-foundation.org > Reporter: gbala...@gmail.com > Regression: No > > [ 65.954959] sm-msp-queue[1279]: unable to qualify my own domain name > (dcsx5testslot3) -- using short name > [ 632.098785] perf interrupt took too long (2505 > 2500), lowering > kernel.perf_event_max_sample_rate to 5 > [ 5880.428123] perf interrupt took too long (5585 > 5000), lowering > kernel.perf_event_max_sample_rate to 25000 > [17934.014969] CE: hpet increased min_delta_ns to 20115 nsec > [38956.721789] CE: hpet4 increased min_delta_ns to 20115 nsec > [46927.872827] hrtimer: interrupt took 63361 ns > [101662.241093] CE: hpet2 increased min_delta_ns to 20115 nsec > [245973.044600] CE: hpet6 increased min_delta_ns to 20115 nsec > [368639.565040] show_signal_msg: 6 callbacks suppressed > [375832.498126] BUG: unable to handle kernel paging request at > 01a400015ff4 > [375832.505300] IP: [] queued_spin_lock_slowpath+0xe6/0x160 > [375832.512394] PGD 0 > [375832.514657] Oops: 0002 [#1] SMP > [375832.518306] Modules linked in: nf_log_ipv6 nf_log_ipv4 nf_log_common > xt_LOG > sctp libcrc32c e1000e tipc udp_tunnel ip6_udp_tunnel 8021q garp iTCO_wdt > xt_physdev br_netfilter bridge stp llc nf_conntrack_ipv4 nf_defrag_ipv4 > ipmiq_drv(O) sio_mmc(O) ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 > nf_defrag_ipv6 xt_state nf_conntrack lockd ip6table_filter event_drv(O) > ip6_tables grace pt_timer_info(O) ddi(O) usb_storage ixgbe igb i2c_i801 > iTCO_vendor_support i2c_algo_bit ioatdma intel_ips i2c_core pcspkr sunrpc ptp > mdio dca pps_core lpc_ich tpm_tis mfd_core tpm [last unloaded: iTCO_wdt] > [375832.573693] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G O4.4.0 > #14 > [375832.581385] Hardware name: PT AMC124/Base Board Product Name, BIOS > LGNAJFIP.PTI.0012.P15 01/15/2014 > [375832.591028] task: 880351a89b40 ti: 880351a9 task.ti: > 880351a9 > [375832.599026] RIP: 0010:[] [] > queued_spin_lock_slowpath+0xe6/0x160 > [375832.608964] RSP: 0018:88035fc83d58 EFLAGS: 00010002 > [375832.614825] RAX: 1447 RBX: 0292 RCX: > 88035fc95fc0 > [375832.622743] RDX: 01a400015ff4 RSI: 0014 RDI: > 880351232f80 > [375832.630567] RBP: 88035fc83d58 R08: 0101 R09: > 0004 > [375832.638348] R10: R11: R12: > 01001002 > [375832.645919] R13: 0001 R14: R15: > > [375832.653610] FS: () GS:88035fc8() > knlGS: > [375832.662317] CS: 0010 DS: ES: CR0: 8005003b > [375832.668483] CR2: 01a400015ff4 CR3: 01c0a000 CR4: > 06e0 > [375832.676133] Stack: > [375832.678344] 88035fc83d78 816de2c1 88034a8bba60 > 880351232f80 > [375832.686163] 88035fc83db8 810bc592 88035fc83dc8 > 880351758000 > [375832.694139] 01001002 b802f4bd > a024e6f0 > [375832.702154] Call Trace: > [375832.704844] > [375832.707018] [] _raw_spin_lock_irqsave+0x31/0x40 > [375832.713970] [] __wake_up+0x32/0x70 > [375832.719444] [] ? tipc_recv_stream+0x370/0x370 [tipc] > [375832.726589] [] sock_def_wakeup+0x30/0x40 > [375832.732566] [] tipc_sk_timeout+0x148/0x180 [tipc] > [375832.739388] [] ? tipc_recv_stream+0x370/0x370 [tipc] > [375832.746507] [] call_timer_fn+0x44/0x110 > [375832.752378] [] ? cascade+0x4a/0x80 > [375832.757848] [] ? tipc_recv_stream+0x370/0x370 [tipc] > [375832.764871] [] run_timer_softirq+0x22c/0x280 > [375832.771175] [] __do_softirq+0xc8/0x260 > [375832.776958] [] irq_exit+0x83/0xb0 > [375832.782369] [] do_IRQ+0x65/0xf0 > [375832.787607] [] common_interrupt+0x7f/0x7f > [375832.793709] > [375832.795803] [] ? cpuidle_enter_state+0xad/0x200 > [375832.802765] [] ? cpuidle_enter_state+0x91/0x200 > [375832.809338] [] cpuidle_enter+0x17/0x20 > [375832.815155] [] call_cpuidle+0x37/0x60 > [375832.821184] [] ? cpuidle_select+0x13/0x20 > [375832.827249] [] cpu_startup_entry+0x211/0x2d0 > [375832.833535] [] start_secondary+0x103/0x130 > [375832.839759] Code: 87 47 02 c1 e0 10 85 c0 74 38 48 89 c2 c1 e8 12 48 c1 ea > 0c 83 e8 01 83 e2 30 48 98 48 81 c2 c0 5f 01 00 48 03 14 c5
Re: [PATCH] mm: memcontrol: only manage socket pressure for CONFIG_INET
On Wed, 9 Dec 2015 13:58:58 -0500 Johannes Weinerwrote: > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 6faea81e66d7..73cd572167bb 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -4220,13 +4220,13 @@ mem_cgroup_css_online(struct cgroup_subsys_state > > *css) > > if (ret) > > return ret; > > > > +#ifdef CONFIG_INET > > #ifdef CONFIG_MEMCG_LEGACY_KMEM > > ret = tcp_init_cgroup(memcg); > > if (ret) > > return ret; > > #endif > > The calls to tcp_init_cgroup() appear earlier in the series than "mm: > memcontrol: hook up vmpressure to socket pressure". However, they get > moved around a few times so fixing it earlier means respinning the > series. Andrew, it's up to you whether we take the bisectability hit > for !CONFIG_INET && CONFIG_MEMCG (how common is this?) or whether you > want me to resend the series. hm, drat, I was suspecting dependency issues here, but a test build said it was OK. Actually, I was expecting this patch series to depend on the linux-next cgroup2 changes, but that doesn't appear to be the case. *should* this series be staged after the cgroup2 code? Regarding this particular series: yes, I think we can live with a bisection hole for !CONFIG_INET && CONFIG_MEMCG users. But I'm not sure why we're discussing bisection issues, because Arnd's build failure occurs with everything applied? > Sorry about the trouble. I don't have a git tree on kernel.org because > we don't really use git in -mm, but the downside is that we don't get > the benefits of the automatic build testing for all kinds of configs. > I'll try to set up a git tree to expose series to full build coverage > before they hit -mm and -next. This sort of thing happens quite rarely. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mm: memcontrol: only manage socket pressure for CONFIG_INET
On Wed, 9 Dec 2015 18:05:05 -0500 Johannes Weiner <han...@cmpxchg.org> wrote: > On Wed, Dec 09, 2015 at 02:28:36PM -0800, Andrew Morton wrote: > > On Wed, 9 Dec 2015 13:58:58 -0500 Johannes Weiner <han...@cmpxchg.org> > > wrote: > > > The calls to tcp_init_cgroup() appear earlier in the series than "mm: > > > memcontrol: hook up vmpressure to socket pressure". However, they get > > > moved around a few times so fixing it earlier means respinning the > > > series. Andrew, it's up to you whether we take the bisectability hit > > > for !CONFIG_INET && CONFIG_MEMCG (how common is this?) or whether you > > > want me to resend the series. > > > > hm, drat, I was suspecting dependency issues here, but a test build > > said it was OK. > > > > Actually, I was expecting this patch series to depend on the linux-next > > cgroup2 changes, but that doesn't appear to be the case. *should* this > > series be staged after the cgroup2 code? > > Code-wise they are independent. My stuff is finishing up the new memcg > control knobs, the cgroup2 stuff is changing how and when those knobs > are exposed from within the cgroup core. I'm not relying on any recent > changes in the cgroup core AFAICS, so the order shouldn't matter here. OK, thanks. > > Regarding this particular series: yes, I think we can live with a > > bisection hole for !CONFIG_INET && CONFIG_MEMCG users. But I'm not > > sure why we're discussing bisection issues, because Arnd's build > > failure occurs with everything applied? > > Arnd's patches apply to the top of the stack, but they address issues > introduced early in the series and the problematic code gets touched a > lot in subsequent patches. E.g. the first build breakage is in ("net: > tcp_memcontrol: simplify linkage between socket and page counter") > when the tcp_init_cgroup() and tcp_destroy_cgroup() function calls get > moved around and lose the CONFIG_INET protection. Yeah, this is a pain. I think I'll fold Arnd's fix into mm-memcontrol-introduce-config_memcg_legacy_kmem.patch (which is staged after all the other MM patches and after linux-next) and will pretend I didn't know about the issue ;) > Anyway, if we can live with the bisection caveat then Arnd's fixes on > top of the kmem series look good to me. Depending on what Vladimir > thinks we might want to replace the CONFIG_SLOB fix with something > else later on, but that shouldn't be a problem, either. I don't have a fix for the CONFIG_SLOB&_MEMCG issue yet. I agree that it would be best to make the combination work correctly rather than banning it, but that does require a bit of runtime testing. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Intel-wired-lan] [Patch V3 5/9] i40e: Use numa_mem_id() to better support memoryless node
On Wed, 19 Aug 2015 17:18:15 -0700 (PDT) David Rientjeswrote: > On Wed, 19 Aug 2015, Patil, Kiran wrote: > > > Acked-by: Kiran Patil > > Where's the call to preempt_disable() to prevent kernels with preemption > from making numa_node_id() invalid during this iteration? David asked this question twice, received no answer and now the patch is in the maintainer tree, destined for mainline. If I was asked this question I would respond The use of numa_mem_id() is racy and best-effort. If the unlikely race occurs, the memory allocation will occur on the wrong node, the overall result being very slightly suboptimal performance. The existing use of numa_node_id() suffers from the same issue. But I'm not the person proposing the patch. Please don't just ignore reviewer comments! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: simplify configfs attributes V2
On Sat, 3 Oct 2015 15:32:36 +0200 Christoph Hellwigwrote: > This series consolidates the code to implement configfs attributes > by providing the ->show and ->store method in common code and using > container_of in the methods to access the containing structure. > > This reduces source and binary size of configfs consumers a lot. There's a version of this patch series (I assume V1) in linux-next via Nicholas's tree so my hands are tied. I trust that Nicholas will update things. Or maybe it was a slipup - modifying usb, ocfs2 etc from the iscsi tree is innovative ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists
On Fri, 2 Oct 2015 15:40:39 +0200 Jesper Dangaard Brouerwrote: > > Thus, I need introducing new code like this patch and at the same time > > have to reduce the number of instruction-cache misses/usage. In this > > case we solve the problem by kmem_cache_free_bulk() not getting called > > too often. Thus, +17 bytes will hopefully not matter too much... but on > > the other hand we sort-of know that calling kmem_cache_free_bulk() will > > cause icache misses. > > I just tested this change on top of my net-use-case patchset... and for > some strange reason the code with this WARN_ON is faster and have much > less icache-misses (1,278,276 vs 2,719,158 L1-icache-load-misses). > > Thus, I think we should keep your fix. > > I cannot explain why using WARN_ON() is better and cause less icache > misses. And I hate when I don't understand every detail. > > My theory is, after reading the assembler code, that the UD2 > instruction (from BUG_ON) cause some kind of icache decoder stall > (Intel experts???). Now that should not be a problem, as UD2 is > obviously placed as an unlikely branch and left at the end of the asm > function call. But the call to __slab_free() is also placed at the end > of the asm function (gets inlined from slab_free() as unlikely). And > it is actually fairly likely that bulking is calling __slab_free (slub > slowpath call). Yes, I was looking at the asm code and the difference is pretty small: a not-taken ud2 versus a not-taken "call warn_slowpath_null", mainly. But I wouldn't assume that the microbenchmarking is meaningful. I've seen shockingly large (and quite repeatable) microbenchmarking differences from small changes in code which isn't even executed (and this is one such case, actually). You add or remove just one byte of text and half the kernel (or half the .o file?) gets a different alignment and this seems to change everything. Deleting the BUG altogether sounds the best solution. As long as the kernel crashes in some manner, we'll be able to work out what happened. And it's cant-happen anyway, isn't it? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists
On Wed, 30 Sep 2015 13:44:19 +0200 Jesper Dangaard Brouerwrote: > Make it possible to free a freelist with several objects by adjusting > API of slab_free() and __slab_free() to have head, tail and an objects > counter (cnt). > > Tail being NULL indicate single object free of head object. This > allow compiler inline constant propagation in slab_free() and > slab_free_freelist_hook() to avoid adding any overhead in case of > single object free. > > This allows a freelist with several objects (all within the same > slab-page) to be free'ed using a single locked cmpxchg_double in > __slab_free() and with an unlocked cmpxchg_double in slab_free(). > > Object debugging on the free path is also extended to handle these > freelists. When CONFIG_SLUB_DEBUG is enabled it will also detect if > objects don't belong to the same slab-page. > > These changes are needed for the next patch to bulk free the detached > freelists it introduces and constructs. > > Micro benchmarking showed no performance reduction due to this change, > when debugging is turned off (compiled with CONFIG_SLUB_DEBUG). > checkpatch says WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON() #205: FILE: mm/slub.c:2888: + BUG_ON(!size); Linus will get mad at you if he finds out, and we wouldn't want that. --- a/mm/slub.c~slub-optimize-bulk-slowpath-free-by-detached-freelist-fix +++ a/mm/slub.c @@ -2885,7 +2885,8 @@ static int build_detached_freelist(struc /* Note that interrupts must be enabled when calling this function. */ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p) { - BUG_ON(!size); + if (WARN_ON(!size)) + return; do { struct detached_freelist df; _ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-next] oops in ip_route_input_noref
On Thu, 17 Sep 2015 10:58:52 +0200 Thierry Redingwrote: > On Wed, Sep 16, 2015 at 09:04:15AM -0600, David Ahern wrote: > > On 9/16/15 9:00 AM, Fabio Estevam wrote: > > >On Wed, Sep 16, 2015 at 6:24 AM, Sergey Senozhatsky > > > wrote: > > > > > >>added by b7503e0cdb5dbec5d201aa69dc14679b5ae8 > > >> > > >> net: Add FIB table id to rtable > > >> > > >> Add the FIB table id to rtable to make the information available for > > >> IPv4 as it is for IPv6. > > > > > >I see the same issue here when booting a mx25 ARM processor via NFS. > > > > > >defconfig is arch/arm/configs/imx_v4_v5_defconfig. > > > > > > > I am still not able to reproduce. While I work on a full Cumulus image for > > other test cases here's a patch to try; eagle eye Nikolay noted a potential > > use without init in the maze of goto's. > > > > Thanks, > > David > > > diff --git a/net/ipv4/route.c b/net/ipv4/route.c > > index da427a4a33fe..80f7c5b7b832 100644 > > --- a/net/ipv4/route.c > > +++ b/net/ipv4/route.c > > @@ -1712,6 +1712,7 @@ static int ip_route_input_slow(struct sk_buff *skb, > > __be32 daddr, __be32 saddr, > > goto martian_source; > > > > res.fi = NULL; > > + res.table = NULL; > > if (ipv4_is_lbcast(daddr) || (saddr == 0 && daddr == 0)) > > goto brd_input; > > > > @@ -1834,6 +1835,7 @@ out: return err; > > RT_CACHE_STAT_INC(in_no_route); > > res.type = RTN_UNREACHABLE; > > res.fi = NULL; > > + res.table = NULL; > > goto local_input; > > > > /* > > I was seeing the same oops as Fabio (except that the faulting address > was 0xb instead of 0x7) and after applying this patch I no longer see > it: > > Tested-by: Thierry Reding I've been hitting this as well. An oops on boot in ip_route_input_slow(), here: #ifdef CONFIG_IP_ROUTE_CLASSID rth->dst.tclassid = itag; #endif rth->rt_is_input = 1; if (res.table) -->>rth->rt_table_id = res.table->tb_id; RT_CACHE_STAT_INC(in_slow_tot); I did this, which made it go away: --- a/net/ipv4/route.c~a +++ a/net/ipv4/route.c @@ -1692,6 +1692,8 @@ static int ip_route_input_slow(struct sk struct net*net = dev_net(dev); bool do_cache; + res.table = 0; + /* IP on this device is disabled. */ if (!in_dev) _ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] slub: improve bulk alloc strategy
On Mon, 15 Jun 2015 17:52:46 +0200 Jesper Dangaard Brouer bro...@redhat.com wrote: Call slowpath __slab_alloc() from within the bulk loop, as the side-effect of this call likely repopulates c-freelist. Choose to reenable local IRQs while calling slowpath. Saving some optimizations for later. E.g. it is possible to extract parts of __slab_alloc() and avoid the unnecessary and expensive (37 cycles) local_irq_{save,restore}. For now, be happy calling __slab_alloc() this lower icache impact of this func and I don't have to worry about correctness. ... --- a/mm/slub.c +++ b/mm/slub.c @@ -2776,8 +2776,23 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, for (i = 0; i size; i++) { void *object = c-freelist; - if (!object) - break; + if (unlikely(!object)) { + c-tid = next_tid(c-tid); + local_irq_enable(); + + /* Invoke slow path one time, then retry fastpath + * as side-effect have updated c-freelist + */ That isn't very grammatical. Block comments are formatted /* * like this */ please. + p[i] = __slab_alloc(s, flags, NUMA_NO_NODE, + _RET_IP_, c); + if (unlikely(!p[i])) { + __kmem_cache_free_bulk(s, i, p); + return false; + } + local_irq_disable(); + c = this_cpu_ptr(s-cpu_slab); + continue; /* goto for-loop */ + } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] slab: infrastructure for bulk object allocation and freeing
On Mon, 15 Jun 2015 17:51:56 +0200 Jesper Dangaard Brouer bro...@redhat.com wrote: +bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, + void **p) +{ + return kmem_cache_alloc_bulk(s, flags, size, p); +} hm, any call to this function is going to be nasty, brutal and short. --- a/mm/slab.c~slab-infrastructure-for-bulk-object-allocation-and-freeing-v3-fix +++ a/mm/slab.c @@ -3425,7 +3425,7 @@ EXPORT_SYMBOL(kmem_cache_free_bulk); bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, void **p) { - return kmem_cache_alloc_bulk(s, flags, size, p); + return __kmem_cache_alloc_bulk(s, flags, size, p); } EXPORT_SYMBOL(kmem_cache_alloc_bulk); _ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] slub bulk alloc: extract objects from the per cpu slab
On Mon, 15 Jun 2015 17:52:07 +0200 Jesper Dangaard Brouer bro...@redhat.com wrote: From: Christoph Lameter c...@linux.com [NOTICE: Already in AKPM's quilt-queue] First piece: acceleration of retrieval of per cpu objects If we are allocating lots of objects then it is advantageous to disable interrupts and avoid the this_cpu_cmpxchg() operation to get these objects faster. Note that we cannot do the fast operation if debugging is enabled, because we would have to add extra code to do all the debugging checks. And it would not be fast anyway. Note also that the requirement of having interrupts disabled avoids having to do processor flag operations. Allocate as many objects as possible in the fast way and then fall back to the generic implementation for the rest of the objects. ... --- a/mm/slub.c +++ b/mm/slub.c @@ -2759,7 +2759,32 @@ EXPORT_SYMBOL(kmem_cache_free_bulk); bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, void **p) { - return kmem_cache_alloc_bulk(s, flags, size, p); + if (!kmem_cache_debug(s)) { + struct kmem_cache_cpu *c; + + /* Drain objects in the per cpu slab */ + local_irq_disable(); + c = this_cpu_ptr(s-cpu_slab); + + while (size) { + void *object = c-freelist; + + if (!object) + break; + + c-freelist = get_freepointer(s, object); + *p++ = object; + size--; + + if (unlikely(flags __GFP_ZERO)) + memset(object, 0, s-object_size); + } + c-tid = next_tid(c-tid); + + local_irq_enable(); It might be worth adding if (!size) return true; here. To avoid the pointless call to __kmem_cache_alloc_bulk(). It depends on the typical success rate of this allocation loop. Do you know what this is? + } + + return __kmem_cache_alloc_bulk(s, flags, size, p); } EXPORT_SYMBOL(kmem_cache_alloc_bulk); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7] slub: fix error path bug in kmem_cache_alloc_bulk
On Mon, 15 Jun 2015 17:52:26 +0200 Jesper Dangaard Brouer bro...@redhat.com wrote: The current kmem_cache/SLAB bulking API need to release all objects in case the layer cannot satisfy the full request. If __kmem_cache_alloc_bulk() fails, all allocated objects in array should be freed, but, __kmem_cache_alloc_bulk() can't know about objects allocated by this slub specific kmem_cache_alloc_bulk() function. Can we fold patches 2, 3 and 4 into a single patch? And maybe patch 5 as well. I don't think we need all these development-time increments in the permanent record. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] netconsole: implement extended console support
On Mon, 11 May 2015 12:41:34 -0400 Tejun Heo t...@kernel.org wrote: printk logbuf keeps various metadata and optional key=value dictionary for structured messages, both of which are stripped when messages are handed to regular console drivers. It can be useful to have this metadata and dictionary available to netconsole consumers. This obviously makes logging via netconsole more complete and the sequence number in particular is useful in environments where messages may be lost or reordered in transit - e.g. when netconsole is used to collect messages in a large cluster where packets may have to travel congested hops to reach the aggregator. The lost and reordered messages can easily be identified and handled accordingly using the sequence numbers. printk recently added extended console support which can be selected by setting CON_EXTENDED flag. There's no such thing as CON_EXTENDED. Not sure what this is trying to say. From console driver side, not much changes. The only difference is that the text passed to the write callback is formatted the same way as /dev/kmsg. This patch implements extended console support for netconsole which can be enabled by either prepending + to a netconsole boot param entry or echoing 1 to extended file in configfs. When enabled, netconsole transmits extended log messages with headers identical to /dev/kmsg output. There's one complication due to message fragments. netconsole limits the maximum message size to 1k and messages longer than that are split into multiple fragments. As all extended console messages should carry matching headers and be uniquely identifiable, each extended message fragment carries full copy of the metadata and an extra header field to identify the specific fragment. The optional header is of the form ncfrag=OFF/LEN where OFF is the byte offset into the message body and LEN is the total length. To avoid unnecessarily making printk format extended messages, Extended netconsole is registered with printk when the first extended netconsole is configured. ... +static ssize_t store_extended(struct netconsole_target *nt, + const char *buf, + size_t count) +{ + int extended; + int err; + + if (nt-enabled) { + pr_err(target (%s) is enabled, disable to update parameters\n, +config_item_name(nt-item)); + return -EINVAL; + } What's the reason for the above? It's unclear (to me, at least ;)) what disable means? Specifically what steps must the operator take to successfully perform this operation? A sentence detailing those steps in netconsole.txt would be nice. + err = kstrtoint(buf, 10, extended); + if (err 0) + return err; + if (extended 0 || extended 1) + return -EINVAL; + + nt-extended = extended; + + return strnlen(buf, count); +} + ... +static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg, + int msg_len) +{ + static char buf[MAX_PRINT_CHUNK]; + const char *header, *body; + int offset = 0; + int header_len, body_len; + + if (msg_len = MAX_PRINT_CHUNK) { + netpoll_send_udp(nt-np, msg, msg_len); + return; + } + + /* need to insert extra header fields, detect header and body */ + header = msg; + body = memchr(msg, ';', msg_len); + if (WARN_ON_ONCE(!body)) + return; + + header_len = body - header; + body_len = msg_len - header_len - 1; + body++; + + /* + * Transfer multiple chunks with the following extra header. + * ncfrag=byte-offset/total-bytes + */ + memcpy(buf, header, header_len); + + while (offset body_len) { + int this_header = header_len; + int this_chunk; + + this_header += scnprintf(buf + this_header, + sizeof(buf) - this_header, + ,ncfrag=%d/%d;, offset, body_len); + + this_chunk = min(body_len - offset, + MAX_PRINT_CHUNK - this_header); + if (WARN_ON_ONCE(this_chunk = 0)) + return; + + memcpy(buf + this_header, body + offset, this_chunk); + + netpoll_send_udp(nt-np, buf, this_header + this_chunk); + + offset += this_chunk; + } What protects `buf'? console_sem, I assume? - static char buf[MAX_PRINT_CHUNK]; + static char buf[MAX_PRINT_CHUNK]; /* Protected by console_sem */ wouldn't hurt. +} + +static void write_ext_msg(struct console *con, const char *msg, I've forgotten what's happening with this patchset. There were a few design-level issues raised against an earlier version. What were those and how have they been addressed? -- To unsubscribe from this
Re: [PATCH 00/28] Swap over NFS -v16
On Tue, 26 Feb 2008 11:50:42 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote: On Tue, 2008-02-26 at 17:03 +1100, Neil Brown wrote: On Saturday February 23, [EMAIL PROTECTED] wrote: What is the NFS and net people's take on all of this? Well I'm only vaguely an NFS person, barely a net person, sporadically an mm person, but I've had a look and it seems to mostly make sense. Thanks for taking a look, and giving such elaborate feedback. I'll try and address these issues asap, but first let me reply to a few points here. Neil's overview of what-all-this-is and how-it-all-works is really good. I'd suggest that you take it over, flesh it out and attach it firmly to the patchset. It really helps. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 10097] New: SMP BUG in __nf_conntrack_find
On Mon, 25 Feb 2008 10:44:08 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=10097 Summary: SMP BUG in __nf_conntrack_find Product: Networking Version: 2.5 KernelVersion: 2.6.25-rc3 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Netfilter/Iptables AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Latest working kernel version: 2.6.24.2 Earliest failing kernel version: 2.6.24-rc3 (not checked before) Distribution: Bluewhite64 Hardware Environment: Athlon X2 4200 Software Environment: samba 3.0, 2.6.25-rc3 kernel + HR + tickless + kernel SMP debugging Problem Description: The Samba smbd daemon triggers regularly the following BUG with 2.6.25-rc3: BUG: using smp_processor_id() in preemptible [] code: nmbd/3167 caller is __nf_conntrack_find+0x119/0x150 Pid: 3167, comm: nmbd Not tainted 2.6.25-rc3 #1 Call Trace: [8038f3f4] debug_smp_processor_id+0xc4/0xd0 [80555d79] __nf_conntrack_find+0x119/0x150 [80555dc9] nf_conntrack_find_get+0x19/0x80 [80556914] nf_conntrack_in+0x1a4/0x5a0 [8020bd33] ? restore_args+0x0/0x30 [8059d596] ipv4_conntrack_local+0x66/0x70 [80554362] nf_iterate+0x62/0xa0 [80567050] ? dst_output+0x0/0x10 [80554406] nf_hook_slow+0x66/0xe0 [80567050] ? dst_output+0x0/0x10 [80568825] __ip_local_out+0xa5/0xb0 [80568841] ip_local_out+0x11/0x30 [80568ac1] ip_push_pending_frames+0x261/0x3e0 [80587153] udp_push_pending_frames+0x233/0x3d0 [8058860f] udp_sendmsg+0x30f/0x710 [802328b0] ? default_wake_function+0x0/0x10 [8058f895] inet_sendmsg+0x45/0x80 [80531fcf] sock_sendmsg+0xdf/0x110 [80251270] ? autoremove_wake_function+0x0/0x40 [802374c7] ? hrtick_resched+0x77/0x90 [8025e2b5] ? trace_hardirqs_on+0xd5/0x160 [80531735] ? sockfd_lookup_light+0x45/0x80 [805323da] sys_sendto+0xea/0x120 [80626bcb] ? _spin_unlock_irq+0x2b/0x60 [8025e2b5] ? trace_hardirqs_on+0xd5/0x160 [80626bd6] ? _spin_unlock_irq+0x36/0x60 [8020b6db] system_call_after_swapgs+0x7b/0x80 Steps to reproduce: Start smbd with the forementionned kernel instrumented for SMP and kernel debugging and hr + tickless enabled. Presumably this is in NF_CT_STAT_INC(). I wonder what caused it to start happening. Guys, this probably means that the developers who tested this change aren't enabling the debug options which all kernel developers _should_ be enabling when testing their code! Documentation/SubmitChecklist has a handy list. Should NF_CT_STAT_INC() be using local_inc()? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 10071] New: kernel hang in inet_init
On Sat, 23 Feb 2008 00:34:06 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=10071 Summary: kernel hang in inet_init Product: Networking Version: 2.5 KernelVersion: 2.6.25 rc2 latest git Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: IPV4 AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Distribution:ubuntu hardy Hardware Environment:dell dimension 5150 Problem Description: kernel hang mostly but boot slowly sometimes Steps to reproduce: boot when the computer manage to boot, inet_init last some time: [1.725306] NET: Registered protocol family 2 [6.825384] IP route cache hash table entries: 32768 (order: 5, 131072 bytes) [6.825803] TCP established hash table entries: 131072 (order: 8, 1048576 bytes) [6.826691] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes) [6.836160] TCP: Hash tables configured (established 131072 bind 65536) [6.836255] TCP reno registered [7.081569] initcall 0xc03b8920: inet_init+0x0/0x3aa() returned 0. [7.081569] initcall 0xc03b8920 ran for 5106 msecs: inet_init+0x0/0x3aa() When booting with acpi=ht it works: [0.124668] Calling initcall 0xc03b8920: inet_init+0x0/0x3aa() [0.124867] NET: Registered protocol family 2 [0.288067] IP route cache hash table entries: 32768 (order: 5, 131072 bytes) [0.288856] TCP established hash table entries: 131072 (order: 8, 1048576 bytes) [0.289739] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes) [0.299156] TCP: Hash tables configured (established 131072 bind 65536) [0.299250] TCP reno registered Feb 23 09:10:28 tiyann kernel: [0.162267] initcall 0xc03b8920: inet_init+0x0/0x3aa() returned 0. [0.162383] initcall 0xc03b8920 ran for 33 msecs: inet_init+0x0/0x3aa() there seem to be a weird interaction between acpi and inet_init any hint on how to provide better information thanks -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: printk_ratelimit and net_ratelimit conflict and tunable behavior
On Mon, 25 Feb 2008 14:36:40 -0600 Steven Hawkes [EMAIL PROTECTED] wrote: From: Steve Hawkes [EMAIL PROTECTED] The printk_ratelimit() and net_ratelimit() functions each have their own tunable parameters to control their respective rate limiting feature, but they share common state variables, preventing independent tuning of the parameters from working correctly. Also, changes to rate limiting tunable parameters do not always take effect properly since state is not recomputed when changes occur. For example, if ratelimit_burst is increased while rate limiting is occurring, the change won't take full effect until at least enough time between messages occurs so that the toks value reaches ratelimit_burst * ratelimit_jiffies. This can result in messages being suppressed when they should be allowed. Implement independent state for printk_ratelimit() and net_ratelimit(), and update state when tunables are changed. This patch causes a large and nasty reject. --- --- linux-2.6.24/include/linux/kernel.h 2008-01-24 16:58:37.0 -0600 +++ linux-2.6.24-printk_ratelimit/include/linux/kernel.h 2008-02-21 11:20:41.751197312 -0600 Probably because you patched 2.6.24. We're developing 2.6.25 now, and the difference between the two is very large inded. Please raise patches against Linus's latest tree? There are other patches pending against printk.c (in -mm and in git-sched) but afacit they won't collide. @@ -196,8 +196,19 @@ static inline int log_buf_copy(char *des unsigned long int_sqrt(unsigned long); +struct printk_ratelimit_state +{ Please do struct printk_ratelimit_state { + unsigned long toks; + unsigned long last_jiffies; + int missed; + int limit_jiffies; + int limit_burst; + char const *facility; +}; I find that the best-value comments one can add to kernel code are to the members of structures. If the reader understands what all the fields do, the code becomes simple to follow. --- linux-2.6.24/net/core/utils.c 2008-01-24 16:58:37.0 -0600 +++ linux-2.6.24-printk_ratelimit/net/core/utils.c2008-02-21 11:03:44.644337698 -0600 @@ -41,7 +41,16 @@ EXPORT_SYMBOL(net_msg_warn); */ int net_ratelimit(void) { - return __printk_ratelimit(net_msg_cost, net_msg_burst); + static struct printk_ratelimit_state limit_state = { + .toks = 10 * 5 * HZ, + .last_jiffies = 0, + .missed= 0, + .limit_jiffies = 5 * HZ, + .limit_burst = 10, + .facility = net + }; + + return __printk_ratelimit(net_msg_cost, net_msg_burst, limit_state); I don't get it. There's one instance of limit_state, kernel-wide, and __printk_ratelimit() modifies it. What prevents one CPU's activities from interfering with a second CPU's activities? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 10073] New: Just-small-enough packets in tunnels are silently eaten
On Sat, 23 Feb 2008 09:17:14 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=10073 Summary: Just-small-enough packets in tunnels are silently eaten Product: Networking Version: 2.5 KernelVersion: 2.6.23 (mainline), 2.6.25-rc2 (mainline), 2.6.18-6-amd64 (Debian Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: IPV6 AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Hi, This has been broken for quite a while, but I haven't gotten around to debug it until now. I have an IPv6-in-IPv4 tunnel between two points, both with MTU 1500 on the regular v4 network. (I've verified that I can indeed send 1500-byte packets and fragments over IPv4 from the two points.) By default, Linux assigns MTU 1480 to this tunnel. However, if I try to ping -s 5000 from one side of the tunnel to the other, I get first Packet too big, mtu=1480 and then on the next packet (when the machine tries to send 1480-byte fragments) Packet too big, mtu=1472. After that, the packet goes through. However, in some cases it seems I do not seem to get the Packet too big ICMP at all. In particular, if I change to a GRE tunnel (where the default MTU is 1476), and send in 1476-byte packets, they are just eaten. They clearly go into the GRE tunnel (according to tcpdump), but no IPv4 packets ever go out on the other side, and no ICMPs are sent back. (There's no iptables rules on the router in question, nor any relevant sysctl settings except that IPv6 forwarding is of course turned on.) If I lower MTU on the interfaces to 1468, everything seems to work just fine. (I cannot change the MTU of a regular ipip tunnel, so it's impossible for me to check whether a lower MTU would have fixed the issues for those as well, but it seems reasonable.) Any idea where the extra eight bytes go? Is there some inner layer of encapsulation in the kernel (adding eight bytes internally) that I've missed? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Nework Emulation with UML, bug 8895
On Sat, 23 Feb 2008 14:04:18 +0100 clowncoder [EMAIL PROTECTED] wrote: Hello, You can have a configured and running network inside a single linux machine, only one script command is enough. After the start of all the machine, a graphical representation of your topology helps your interactions with the network. This virtual network can be downloaded at http://clownix.net, I must warn you, it is a 350mega bz2 file-system that needs 4 giga on the host. Very usefull for networking development in kernel as proved by bug 8895 that has been found with the above tool: The uncorrected bug 8895: file: /usr/src/linux-2.6.24.2/net/ipv6/ip6_fib.c line 796: dst_free(rt-u.dst); --- /Comment From qmiao mailto:[EMAIL PROTECTED] 2007-08-29 23:33:07 /--- fib6_add ... if (fn-leaf == NULL) { fn-leaf = rt;--**-- rt is assigned to fn-leaf atomic_inc(rt-rt6i_ref); } ... err = fib6_add_rt2node(fn, rt, info); -**- return -EEXIST ... (Here err was not null) ... if (err) { ... dst_free(rt-u.dst); --**-- Actually rt is still in tree (fn-leaf = rt /* see above */) ... } Please cc netdev on net-related bug reports. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 10063] New: Network problems with 2.6.23+
(plese respond via emailed reply-to-all, not via the bugzilla web interface) On Thu, 21 Feb 2008 18:04:45 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=10063 There's more info (including a tcpdump) in bugzilla. Summary: Network problems with 2.6.23+ Product: Networking Version: 2.5 KernelVersion: 2.6.24.2 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: IPV4 AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Latest working kernel version: 2.6.23 Earliest failing kernel version: 2.6.24 Distribution: Gentoo Hardware Environment: model name : Dual Core AMD Opteron(tm) Processor 170 stepping: 2 cpu MHz : 2009.144 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 15) sky2 Working on 2.6.24 (as far as I can see) in : model name : Intel(R) Pentium(R) 4 CPU 2.80GHz stepping: 9 cpu MHz : 2793.222 01:01.0 Ethernet controller: Intel Corporation 82547GI Gigabit Ethernet Controller Software Environment: TCP-transfers (FTP,HTTP) Problem Description: First I want to apologize that I'm not able to do alot of troubleshooting on this issue cause I don't know where to start exactly. The problem is that I have, after upgrading to 2.6.24 from 2.6.23, experienced alot of weird network problems. For example, FTP-transfers to certain hosts on the Internet stalls after a while. I have one dst-host where this problem appears frequently. On another dst-host it doesn't happen as often. Sometimes the transfers doesn't even start. First we did alot of faultracing on the transmission but after downgrading the kernel it was apparent that the transmission wasn't the problem. I did not think about filing this report until i saw, http://kerneltrap.org/node/15550. So I guess I'm not alone. I must also add that to some dst-hosts there is no problem at all. So I would guess that there might be something with the mtu,wscale or something between hosts. All the other dst-hosts I've tried is running 2.6.23 I will try to test alot of kernel-version to try to ease the troubleshooting. Steps to reproduce: Start an FTP-transfer from the box mention above. I guess it might help if you can tell us the IP address of some of the problematic servers. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 8/8] Jhash in too big for inlining, move under lib/
On Wed, 20 Feb 2008 15:47:18 +0200 Ilpo Järvinen [EMAIL PROTECTED] wrote: vmlinux.o: 62 functions changed, 66 bytes added, 10935 bytes removed, diff: -10869 ...+ these to lib/jhash.o: jhash_3words: 112 jhash2: 276 jhash: 475 select for networking code might need a more fine-grained approach. It should be possible to use a modular jhash.ko. The things which you have identified as clients of the jhash library are usually loaded as modules. But in the case where someone does (say) NFSD=y we do need jhash.o linked into vmlinux also. This is doable in Kconfig but I always forget how. Adrian, Sam and Randy are the repositories of knowledge here ;) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/8]: uninline uninline
On Wed, 20 Feb 2008 15:47:10 +0200 Ilpo J__rvinen [EMAIL PROTECTED] wrote: Ok, here's the top of the list (1+ bytes): This is good stuff - thanks. -41525 2066 f, 3370 +, 44895 -, diff: -41525 IS_ERR This is a surprise. I expect that the -mm-only profile-likely-unlikely-macros.patch is the cause of this and mainline doesn't have this problem. If true, then this likely/unlikely bloat has probably spread into a lot of your other results and it all should be redone against mainline, sorry :( (I'm not aware of anyone having used profile-likely-unlikely-macros.patch in quite some time. That's unfortunate because it has turned up some fairly flagrant code deoptimisations) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING: at net/ipv4/tcp_input.c:2054 tcp_mark_head_lost()
(cc netdev) On Wed, 20 Feb 2008 20:04:39 -0800 (PST) Giangiacomo Mariotti [EMAIL PROTECTED] wrote: This is what I got with dmesg : [ 266.978695] WARNING: at net/ipv4/tcp_input.c:2054 tcp_mark_head_lost() [ 266.978701] Pid: 0, comm: swapper Not tainted 2.6.24.2-my001 #1 [ 266.978703] [ 266.978704] Call Trace: [ 266.978706] IRQ [80426981] tcp_ack+0x16d8/0x197f [ 266.978721] [8022e72f] __wake_up+0x38/0x4e [ 266.978727] [804295ef] tcp_rcv_established+0xe2/0x8cb [ 266.978732] [8042f56f] tcp_v4_do_rcv+0x30/0x39c [ 266.978738] [80431d29] tcp_v4_rcv+0x99b/0xa06 [ 266.978743] [803f2c95] __netdev_alloc_skb+0x29/0x43 [ 266.978749] [80416d21] ip_local_deliver_finish+0x152/0x212 [ 266.978753] [80416bac] ip_rcv_finish+0x2f8/0x31b [ 266.978758] [803f6c42] netif_receive_skb+0x3ae/0x3d1 [ 266.978763] [8037398f] rtl8169_rx_interrupt+0x45f/0x53e [ 266.978768] [8037405b] rtl8169_poll+0x36/0x16a [ 266.978773] [803f8ca7] net_rx_action+0xb7/0x1f3 [ 266.978778] [8023a3a5] __do_softirq+0x65/0xce [ 266.978782] [8020b0d2] default_idle+0x0/0x3d [ 266.978786] [8020d09c] call_softirq+0x1c/0x28 [ 266.978789] [8020e4f0] do_softirq+0x2c/0x7d [ 266.978792] [8023a2fb] irq_exit+0x3f/0x84 [ 266.978794] [8020e729] do_IRQ+0xb6/0xd5 [ 266.978797] [8020b0d2] default_idle+0x0/0x3d [ 266.978800] [8020c421] ret_from_intr+0x0/0xa [ 266.978801] EOI [8020b0fb] default_idle+0x29/0x3d [ 266.978809] [8020b1a2] cpu_idle+0x93/0xbb [ 266.978813] [805cfa4b] start_kernel+0x2bb/0x2c7 [ 266.978818] [805cf123] _sinittext+0x123/0x12a [ 266.978821] This though didn't cause any user-visible problem. .config file : -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/28] mm: __GFP_MEMALLOC
On Wed, 20 Feb 2008 15:46:19 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote: __GFP_MEMALLOC will allow the allocation to disregard the watermarks, much like PF_MEMALLOC. 'twould be nice if the changelog had some explanation of the reason for this change. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/28] mm: emergency pool
On Wed, 20 Feb 2008 15:46:17 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote: @@ -213,7 +213,7 @@ enum zone_type { struct zone { /* Fields commonly accessed by the page allocator */ - unsigned long pages_min, pages_low, pages_high; + unsigned long pages_emerg, pages_min, pages_low, pages_high; It would be nice to make these one-per-line, then document them. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/28] mm: allow PF_MEMALLOC from softirq context
On Wed, 20 Feb 2008 15:46:15 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote: Allow PF_MEMALLOC to be set in softirq context. When running softirqs from a borrowed context save current-flags, ksoftirqd will have its own task_struct. The second sentence doesn't make sense. This is needed to allow network softirq packet processing to make use of PF_MEMALLOC. ... +#define tsk_restore_flags(p, pflags, mask) \ + do {(p)-flags = ~(mask); \ + (p)-flags |= ((pflags) (mask)); } while (0) + Does it need to be a macro? If so, it really should cook up a temporary to avoid referencing p twice - the children might be watching. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/28] Swap over NFS -v16
On Wed, 20 Feb 2008 15:46:10 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote: Another posting of the full swap over NFS series. Well I looked. There's rather a lot of it and I wouldn't pretend to understand it. What is the NFS and net people's take on all of this? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/28] mm: kmem_estimate_pages()
On Wed, 20 Feb 2008 15:46:14 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote: Provide a method to get the upper bound on the pages needed to allocate a given number of objects from a given kmem_cache. This lays the foundation for a generic reserve framework as presented in a later patch in this series. This framework needs to convert object demand (kmalloc() bytes, kmem_cache_alloc() objects) to pages. ... /* + * return the max number of pages required to allocated count + * objects from the given cache + */ +unsigned kmem_estimate_pages(struct kmem_cache *s, gfp_t flags, int objects) You might want to have another go at that comment. +/* + * return the max number of pages required to allocate @bytes from kmalloc + * in an unspecified number of allocation of heterogeneous size. + */ +unsigned kestimate(gfp_t flags, size_t bytes) And its pal. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/28] mm: system wide ALLOC_NO_WATERMARK
On Wed, 20 Feb 2008 15:46:18 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote: Change ALLOC_NO_WATERMARK page allocation such that the reserves are system wide - which they are per setup_per_zone_pages_min(), when we scrape the barrel, do it properly. The changelog is fairly incomprehensible. mm/page_alloc.c |6 ++ 1 file changed, 6 insertions(+) Index: linux-2.6/mm/page_alloc.c === --- linux-2.6.orig/mm/page_alloc.c +++ linux-2.6/mm/page_alloc.c @@ -1552,6 +1552,12 @@ restart: rebalance: if (alloc_flags ALLOC_NO_WATERMARKS) { nofail_alloc: + /* + * break out of mempolicy boundaries + */ + zonelist = NODE_DATA(numa_node_id())-node_zonelists + + gfp_zone(gfp_mask); + /* go through the zonelist yet again, ignoring mins */ page = get_page_from_freelist(gfp_mask, order, zonelist, ALLOC_NO_WATERMARKS); As is the patch. People who care about mempolicies will want a better explanation, please, so they can check that we're not busting their stuff. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 15/28] netvm: network reserve infrastructure
On Wed, 20 Feb 2008 15:46:25 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote: Provide the basic infrastructure to reserve and charge/account network memory. We provide the following reserve tree: 1) total network reserve 2)network TX reserve 3) protocol TX pages 4)network RX reserve 5) SKB data reserve [1] is used to make all the network reserves a single subtree, for easy manipulation. [2] and [4] are merely for eastetic reasons. The TX pages reserve [3] is assumed bounded by it being the upper bound of memory that can be used for sending pages (not quite true, but good enough) The SKB reserve [5] is an aggregate reserve, which is used to charge SKB data against in the fallback path. The consumers for these reserves are sockets marked with: SOCK_MEMALLOC Such sockets are to be used to service the VM (iow. to swap over). They must be handled kernel side, exposing such a socket to user-space is a BUG. +/** + * sk_adjust_memalloc - adjust the global memalloc reserve for critical RX + * @socks: number of new %SOCK_MEMALLOC sockets + * @tx_resserve_pages: number of pages to (un)reserve for TX + * + * This function adjusts the memalloc reserve based on system demand. + * The RX reserve is a limit, and only added once, not for each socket. + * + * NOTE: + * @tx_reserve_pages is an upper-bound of memory used for TX hence + * we need not account the pages like we do for RX pages. + */ +int sk_adjust_memalloc(int socks, long tx_reserve_pages) +{ + int nr_socks; + int err; + + err = mem_reserve_pages_add(net_tx_pages, tx_reserve_pages); + if (err) + return err; + + nr_socks = atomic_read(memalloc_socks); + if (!nr_socks socks 0) + err = mem_reserve_connect(net_reserve, mem_reserve_root); This looks like it should have some locking? + nr_socks = atomic_add_return(socks, memalloc_socks); + if (!nr_socks socks) + err = mem_reserve_disconnect(net_reserve); Or does that try to make up for it? Still looks fishy. + if (err) + mem_reserve_pages_add(net_tx_pages, -tx_reserve_pages); + + return err; +} + +/** + * sk_set_memalloc - sets %SOCK_MEMALLOC + * @sk: socket to set it on + * + * Set %SOCK_MEMALLOC on a socket and increase the memalloc reserve + * accordingly. + */ +int sk_set_memalloc(struct sock *sk) +{ + int set = sock_flag(sk, SOCK_MEMALLOC); +#ifndef CONFIG_NETVM + BUG(); +#endif ?? #error, maybe? + if (!set) { + int err = sk_adjust_memalloc(1, 0); + if (err) + return err; + + sock_set_flag(sk, SOCK_MEMALLOC); + sk-sk_allocation |= __GFP_MEMALLOC; + } + return !set; +} +EXPORT_SYMBOL_GPL(sk_set_memalloc); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 17/28] netvm: hook skb allocation to reserves
On Wed, 20 Feb 2008 15:46:27 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote: Change the skb allocation api to indicate RX usage and use this to fall back to the reserve when needed. SKBs allocated from the reserve are tagged in skb-emergency. Teach all other skb ops about emergency skbs and the reserve accounting. Use the (new) packet split API to allocate and track fragment pages from the emergency reserve. Do this using an atomic counter in page-index. This is needed because the fragments have a different sharing semantic than that indicated by skb_shinfo()-dataref. Note that the decision to distinguish between regular and emergency SKBs allows the accounting overhead to be limited to the later kind. ... +static inline void skb_get_page(struct sk_buff *skb, struct page *page) +{ + get_page(page); + if (skb_emergency(skb)) + atomic_inc(page-frag_count); +} + +static inline void skb_put_page(struct sk_buff *skb, struct page *page) +{ + if (skb_emergency(skb) atomic_dec_and_test(page-frag_count)) + rx_emergency_put(PAGE_SIZE); + put_page(page); +} I'm thinking we should do `#define slowcall inline' then use that in the future. static void skb_release_data(struct sk_buff *skb) { if (!skb-cloned || !atomic_sub_return(skb-nohdr ? (1 SKB_DATAREF_SHIFT) + 1 : 1, skb_shinfo(skb)-dataref)) { + int size; + +#ifdef NET_SKBUFF_DATA_USES_OFFSET + size = skb-end; +#else + size = skb-end - skb-head; +#endif The patch adds rather a lot of ifdefs. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/28] mm: memory reserve management
On Wed, 20 Feb 2008 15:46:20 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote: Generic reserve management code. It provides methods to reserve and charge. Upon this, generic alloc/free style reserve pools could be build, which could fully replace mempool_t functionality. It should also allow for a Banker's algorithm replacement of __GFP_NOFAIL. Generally: the comments in this code are a bit straggly and hard to follow. They'd be worth a revisit. +/* + * Simple output of the reserve tree in: /proc/reserve_info + * Example: + * + * localhost ~ # cat /proc/reserve_info + * total reserve 8156K (0/544817) + * total network reserve 8156K (0/544817) + * network TX reserve 196K (0/49) + * protocol TX pages 196K (0/49) + * network RX reserve 7960K (0/544768) + * IPv6 route cache 1372K (0/4096) + * IPv4 route cache 5468K (0/16384) + * SKB data reserve 1120K (0/524288) + * IPv6 fragment cache560K (0/262144) + * IPv4 fragment cache560K (0/262144) + */ Well, Simple was a freudian typo. Not designed for programmatic parsing, I see. +static __init int mem_reserve_proc_init(void) +{ + struct proc_dir_entry *entry; + + entry = create_proc_entry(reserve_info, S_IRUSR, NULL); I think we're supposed to use proc_create(). Blame Alexey. + if (entry) + entry-proc_fops = mem_reserve_opterations; + + return 0; +} + +__initcall(mem_reserve_proc_init); module_init() is more trendy. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] alloc_percpu() fails to allocate percpu data
On Thu, 21 Feb 2008 19:00:03 +0100 Eric Dumazet [EMAIL PROTECTED] wrote: +#ifndef cache_line_size +#define cache_line_size()L1_CACHE_BYTES +#endif argh, you made me look. Really cache_line_size() should be implemented in include/linux/cache.h. Then we tromp the stupid private implementations in slob.c and slub.c. Then we wonder why x86 uses a custom cache_line_size(), but still uses L1_CACHE_BYTES for its L1_CACHE_ALIGN(). Once we've answered that, we look at your + /* +* We should make sure each CPU gets private memory. +*/ + size = roundup(size, cache_line_size()); and wonder whether it should have used L1_CACHE_ALIGN(). I think I'd better stop looking. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 8/8] Jhash in too big for inlining, move under lib/
On Sat, 23 Feb 2008 12:05:36 +0200 (EET) Ilpo Järvinen [EMAIL PROTECTED] wrote: On Sat, 23 Feb 2008, Andrew Morton wrote: On Wed, 20 Feb 2008 15:47:18 +0200 Ilpo Järvinen [EMAIL PROTECTED] wrote: vmlinux.o: 62 functions changed, 66 bytes added, 10935 bytes removed, diff: -10869 ...+ these to lib/jhash.o: jhash_3words: 112 jhash2: 276 jhash: 475 select for networking code might need a more fine-grained approach. It should be possible to use a modular jhash.ko. The things which you have identified as clients of the jhash library are usually loaded as modules. But in the case where someone does (say) NFSD=y we do need jhash.o linked into vmlinux also. This is doable in Kconfig but I always forget how. Ok, even though its not that likely that one lives without e.g. net/ipv4/inet_connection_sock.c or net/netlink/af_netlink.c? Sure, the number of people who will want CONFIG_JHASH=n is very small. But maybe some guys really know what they are doing and can come up with config that would be able to build it as module (for other than proof-of-concept uses I mean)... :-/ Adrian, Sam and Randy are the repositories of knowledge here ;) Thanks, I'll consult them in this. I've never needed to do any Kconfig stuff so far so it's no surprise I have very little clue... :-) Thanks. If it gets messy I'd just put lib/jhash.o in obj-y. I assume it's pretty small. I've one question for you Andrew, how would you like this kind of cross-subsys toucher to be merged, through you directly I suppose? Sure, I can scrounge around for appropriate reviews and acks and can make sure that nobody's pending work gets disrupted. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/8]: uninline uninline
On Sat, 23 Feb 2008 14:15:06 +0100 Andi Kleen [EMAIL PROTECTED] wrote: Andrew Morton [EMAIL PROTECTED] writes: -41525 2066 f, 3370 +, 44895 -, diff: -41525 IS_ERR This is a surprise. I expect that the -mm-only profile-likely-unlikely-macros.patch is the cause of this and mainline doesn't have this problem. Shouldn't they only have overhead when the respective CONFIG is enabled? yup. If true, then this likely/unlikely bloat has probably spread into a lot of your other results and it all should be redone against mainline, sorry :( (I'm not aware of anyone having used profile-likely-unlikely-macros.patch in quite some time. That's unfortunate because it has turned up some fairly flagrant code deoptimisations) Is there any reason they couldn't just be merged to mainline? I think it's a useful facility. ummm, now why did we made that decision... I think we decided that it's the sort of thing which one person can run once per few months and that will deliver its full value. I can maintain it in -mm and we're happy - no need to add it to mainline. No strong feelings either way really. It does have the downside that the kernel explodes if someone adds unlikely or likely to the vdso code and I need to occasionally hunt down new additions and revert them in that patch. That makes it a bit of a maintenance burden. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 10061] New: Hang in md5_resync
(please respond via emailed reply-to-all, not via the bugzilla web interface, thanks) On Thu, 21 Feb 2008 13:13:13 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=10061 Summary: Hang in md5_resync Product: IO/Storage Version: 2.5 KernelVersion: 2.6.25-rc2 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: MD AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Latest working kernel version: 2.6.23 Earliest failing kernel version: 2.6.25-rc2 Distribution: Debian unstable Hardware Environment: sata_nv, ext3, Seagate SATA disks Software Environment: N/A Problem Description: After upgrading to 2.6.25, I did some light I/O (basically a tab-complete in my home directory), and the whole machine promptly hung. After a while, the serial console spat out: [ 548.684064] BUG: soft lockup - CPU#1 stuck for] [ 548.684065] CPU 1: [ 548.684065] Modules linked in: tun eeprom ide_generic ide_disk ide_cd_mod cdx [ 548.684065] Pid: 3599, comm: md4_resync Not tainted 2.6.25-rc2 #1 [ 548.684065] RIP: 0010:[802fee07] [802fee07] __write_loc0 [ 548.684065] RSP: 0018:81011fcffc18 EFLAGS: 0287 [ 548.684065] RAX: 81011c14dfd8 RBX: 8100d42d30c0 RCX: 8100d42d30c0 [ 548.684065] RDX: RSI: 8100d4cd0d90 RDI: 8100d4cd0d00 [ 548.684065] RBP: 81011fcffb90 R08: fe538a93 R09: [ 548.684065] R10: 00c0 R11: 803ad3e1 R12: 8020bcb6 [ 548.684065] R13: 81011fcffb90 R14: 8100d4cd0d00 R15: [ 548.684065] FS: 7f8748e0c6e0() GS:81011fcb5dc0() knlGS:0 [ 548.684065] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b [ 548.684066] CR2: 7f8748e076f3 CR3: d296b000 CR4: 06e0 [ 548.684066] DR0: DR1: DR2: [ 548.684066] DR3: DR6: 0ff0 DR7: 0400 [ 548.684066] [ 548.684066] Call Trace: [ 548.684066] IRQ [80446fe4] ? _write_lock_bh+0x1a/0x1c [ 548.684066] [803ad49e] ? neigh_resolve_output+0xbd/0x281 [ 548.684066] [80407ab5] ? ip6_output+0xc5c/0xc71 [ 548.684066] [8039d414] ? sock_alloc_send_skb+0x93/0x1e1 [ 548.684066] [8041685b] ? __ndisc_send+0x3c6/0x4fb [ 548.684066] [880a431a] ? :sd_mod:sd_prep_fn+0x74/0x6fb [ 548.684066] [8041725b] ? ndisc_send_ns+0x71/0x7e [ 548.684066] [80363c3c] ? scsi_request_fn+0x339/0x3a5 [ 548.684066] [804181df] ? ndisc_solicit+0x159/0x166 [ 548.684066] [80235673] ? __mod_timer+0xb0/0xbf [ 548.684066] [803adc72] ? neigh_timer_handler+0x292/0x2d0 [ 548.684066] [803ad9e0] ? neigh_timer_handler+0x0/0x2d0 [ 548.684066] [8023521b] ? run_timer_softirq+0x151/0x1bf [ 548.684066] [80231f40] ? __do_softirq+0x65/0xcf [ 548.684066] [8020c20c] ? call_softirq+0x1c/0x28 [ 548.684066] [8020d8b4] ? do_softirq+0x2c/0x68 [ 548.684066] [8021919d] ? smp_apic_timer_interrupt+0x8a/0xa5 [ 548.684066] [8020bcb6] ? apic_timer_interrupt+0x66/0x70 [ 548.684066] EOI [880d549c] ? :raid456:raid5_compute_sector+0x19 [ 548.684066] [880d5511] ? :raid456:stripe_to_pdidx+0x48/0x51 [ 548.684066] [880db728] ? :raid456:sync_request+0x7d3/0x8a1 [ 548.684066] [802edf17] ? generic_unplug_device+0x18/0x24 [ 548.684066] [802ecdef] ? blk_unplug+0x5a/0x60 [ 548.684066] [880d51af] ? :raid456:unplug_slaves+0x55/0x91 [ 548.684066] [880b0884] ? :md_mod:md_do_sync+0x54e/0x96c [ 548.684066] [802429a7] ? getnstimeofday+0x2f/0x83 [ 548.684066] [802263e0] ? try_to_wake_up+0xfd/0x10e [ 548.684066] [880b2ff6] ? :md_mod:md_thread+0xde/0xfa [ 548.684066] [880b2f18] ? :md_mod:md_thread+0x0/0xfa [ 548.684066] [8023e741] ? kthread+0x47/0x76 [ 548.684066] [80228ebd] ? schedule_tail+0x28/0x5c [ 548.684066] [8020be98] ? child_rip+0xa/0x12 [ 548.684066] [8023e6fa] ? kthread+0x0/0x76 [ 548.684066] [8020be8e] ? child_rip+0x0/0x12 [ 548.684066] and the machine continued hanging. Downgraded to 2.6.23 again, seems to work fine. Will try 2.6.24.2 shortly. This looks like a networking regression, not a RAID regression. We did have a big locking bug in netlink in 2.6.25-rc2, although your trace doesn't appear to point at that particular bug. It would be good if you could test the latest Linux snapshot please, see if we've already fixed this, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED]
Re: Linux 2.6.24.1 - kernel does not boot; IRQ trouble?
On Sun, 17 Feb 2008 00:54:08 + (GMT) Chris Rankin [EMAIL PROTECTED] wrote: [Try this again, except this time I'll force the attachment as inline text!] Hi, I have managed to boot 2.6.24.1 on this machine, with the NMI watchdog enabled, by using the acpi=noirq option. (There does seem to be some unhappiness with bridge symlinks in sysfs, though.) ... sysfs: duplicate filename 'bridge' can not be created WARNING: at fs/sysfs/dir.c:424 sysfs_add_one() Pid: 1, comm: swapper Not tainted 2.6.24.1 #1 [c0105020] show_trace_log_lvl+0x1a/0x2f [c0105990] show_trace+0x12/0x14 [c010613d] dump_stack+0x6c/0x72 [c01991bf] sysfs_add_one+0x57/0xbc [c0199e41] sysfs_create_link+0xc2/0x10d [c01bae9a] pci_bus_add_devices+0xbd/0x103 [c034016c] pci_legacy_init+0x56/0xe3 [c03274e1] kernel_init+0x157/0x2c3 [c0104c83] kernel_thread_helper+0x7/0x10 === pci :00:01.0: Error creating sysfs bridge symlink, continuing... sysfs: duplicate filename 'bridge' can not be created WARNING: at fs/sysfs/dir.c:424 sysfs_add_one() Pid: 1, comm: swapper Not tainted 2.6.24.1 #1 [c0105020] show_trace_log_lvl+0x1a/0x2f [c0105990] show_trace+0x12/0x14 [c010613d] dump_stack+0x6c/0x72 [c01991bf] sysfs_add_one+0x57/0xbc [c0199e41] sysfs_create_link+0xc2/0x10d [c01bae9a] pci_bus_add_devices+0xbd/0x103 [c01bae82] pci_bus_add_devices+0xa5/0x103 [c034016c] pci_legacy_init+0x56/0xe3 [c03274e1] kernel_init+0x157/0x2c3 [c0104c83] kernel_thread_helper+0x7/0x10 === I have a vague feeling that this was fixed, perhaps in 2.6.24.x? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.25-rc2] e100: Trying to free already-free IRQ 11 during suspend ...
On Sun, 17 Feb 2008 15:36:50 +0300 Andrey Borzenkov [EMAIL PROTECTED] wrote: ... and possibly reboot/poweroff (it flows by too fast to be legible). [ 8803.850634] ACPI: Preparing to enter system sleep state S3 [ 8803.853141] Suspending console(s) [ 8805.287505] serial 00:09: disabled [ 8805.291564] Trying to free already-free IRQ 11 [ 8805.291579] Pid: 6920, comm: pm-suspend Not tainted 2.6.25-rc2-1avb #2 [ 8805.291628] [c0152127] free_irq+0xb7/0x130 [ 8805.291675] [c024bd80] e100_suspend+0xc0/0x100 [ 8805.291724] [c01eaa36] pci_device_suspend+0x26/0x70 [ 8805.291747] [c0243674] suspend_device+0x94/0xd0 [ 8805.291763] [c02439a3] device_suspend+0x153/0x240 [ 8805.291784] [c014314f] suspend_devices_and_enter+0x4f/0xf0 [ 8805.291808] [c0143a5f] ? freeze_processes+0x3f/0x80 [ 8805.291825] [c01432fa] enter_state+0xaa/0x140 [ 8805.291840] [c014341f] state_store+0x8f/0xd0 [ 8805.291852] [c0143390] ? state_store+0x0/0xd0 [ 8805.291866] [c01d3404] kobj_attr_store+0x24/0x30 [ 8805.291901] [c01b547b] sysfs_write_file+0xbb/0x110 [ 8805.291936] [c0177d79] vfs_write+0x99/0x130 [ 8805.291963] [c01b53c0] ? sysfs_write_file+0x0/0x110 [ 8805.291979] [c01782fd] sys_write+0x3d/0x70 [ 8805.291998] [c010409a] sysenter_past_esp+0x5f/0xa5 [ 8805.292038] === [ 8805.347640] ACPI: PCI interrupt for device :00:06.0 disabled [ 8805.361128] ACPI: PCI interrupt for device :00:02.0 disabled [ 8805.376670] hwsleep-0322 [00] enter_sleep_state : Entering sleep state [S3] [ 8805.376670] Back to C! Interface is unused normally (only for netconsole sometimes). dmesg and config attached. Does reverting this: commit 8543da6672b0994921f014f2250e27ae81645580 Author: Auke Kok [EMAIL PROTECTED] Date: Wed Dec 12 16:30:42 2007 -0800 e100: free IRQ to remove warningwhenrebooting with this patch: --- a/drivers/net/e100.c~revert-1 +++ a/drivers/net/e100.c @@ -2804,9 +2804,8 @@ static int e100_suspend(struct pci_dev * pci_enable_wake(pdev, PCI_D3cold, 0); } - free_irq(pdev-irq, netdev); - pci_disable_device(pdev); + free_irq(pdev-irq, netdev); pci_set_power_state(pdev, PCI_D3hot); return 0; @@ -2848,8 +2847,6 @@ static void e100_shutdown(struct pci_dev pci_enable_wake(pdev, PCI_D3cold, 0); } - free_irq(pdev-irq, netdev); - pci_disable_device(pdev); pci_set_power_state(pdev, PCI_D3hot); } _ fix it? Hmm ... after resume device has disappeared at all ... {pts/1}% cat /proc/interrupts CPU0 0:1290492XT-PIC-XTtimer 1: 6675XT-PIC-XTi8042 2: 0XT-PIC-XTcascade 3: 2XT-PIC-XT 4: 2XT-PIC-XT 5: 3XT-PIC-XT 7: 4XT-PIC-XTirda0 8: 0XT-PIC-XTrtc0 9:583XT-PIC-XTacpi 10: 2XT-PIC-XT 11: 31483XT-PIC-XTyenta, yenta, yenta, ohci_hcd:usb1, ALI 5451, pcmcia0.0 12: 28070XT-PIC-XTi8042 14: 21705XT-PIC-XTide0 15: 82123XT-PIC-XTide1 NMI: 0 Non-maskable interrupts TRM: 0 Thermal event interrupts SPU: 0 Spurious interrupts ERR: 0 I hope that's not a separate bug... -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.25-rc2, 2.6.24-rc8] page allocation failure...
On Sun, 17 Feb 2008 13:20:59 + Daniel J Blueman [EMAIL PROTECTED] wrote: I'm still hitting this with e1000e on 2.6.25-rc2, 10 times again. It's clearly non-fatal, but then do we expect it to occur? Daniel --- [dmesg] [ 1250.822786] swapper: page allocation failure. order:3, mode:0x4020 [ 1250.822786] Pid: 0, comm: swapper Not tainted 2.6.25-rc2-119 #2 [ 1250.822786] [ 1250.822786] Call Trace: [ 1250.822786] IRQ [8025fe9e] __alloc_pages+0x34e/0x3a0 [ 1250.822786] [8048c6df] ? __netdev_alloc_skb+0x1f/0x40 [ 1250.822786] [8027acc2] __slab_alloc+0x102/0x3d0 [ 1250.822786] [8048c6df] ? __netdev_alloc_skb+0x1f/0x40 [ 1250.822786] [8027b8cb] __kmalloc_track_caller+0x7b/0xc0 [ 1250.822786] [8048b74f] __alloc_skb+0x6f/0x160 [ 1250.822786] [8048c6df] __netdev_alloc_skb+0x1f/0x40 [ 1250.822786] [8042652d] e1000_alloc_rx_buffers+0x1ed/0x260 [ 1250.822786] [80426b5a] e1000_clean_rx_irq+0x22a/0x330 [ 1250.822786] [80422981] e1000_clean+0x1e1/0x540 [ 1250.822786] [8024b7a5] ? tick_program_event+0x45/0x70 [ 1250.822786] [804930ba] net_rx_action+0x9a/0x150 [ 1250.822786] [802336b4] __do_softirq+0x74/0xf0 [ 1250.822786] [8020c5fc] call_softirq+0x1c/0x30 [ 1250.822786] [8020eaad] do_softirq+0x3d/0x80 [ 1250.822786] [80233635] irq_exit+0x85/0x90 [ 1250.822786] [8020eba5] do_IRQ+0x85/0x100 [ 1250.822786] [8020a5b0] ? mwait_idle+0x0/0x50 [ 1250.822786] [8020b981] ret_from_intr+0x0/0xa [ 1250.822786] EOI [8020a5f5] ? mwait_idle+0x45/0x50 [ 1250.822786] [80209a92] ? enter_idle+0x22/0x30 [ 1250.822786] [8020a534] ? cpu_idle+0x74/0xa0 [ 1250.822786] [80527825] ? rest_init+0x55/0x60 They're regularly reported with e1000 too - I don't think aything really changed. e1000 has this crazy problem where because of a cascade of follies (mainly borked hardware) it has to do a 32kb allocation for a 9kb(?) packet. It would be sad if that was carried over into e1000e? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: include/linux/pcounter.h
On Sat, 16 Feb 2008 11:07:29 +0100 Eric Dumazet [EMAIL PROTECTED] wrote: Andrew, pcounter is a temporary abstraction. It's buggy! Main problems are a) possible return of negative numbers b) some of the API can't be from preemptible code c) excessive interrupt-off time on some machines if used from irq-disabled sections. It is temporaty because it will vanish as soon as Christoph Clameter (or somebody else) provides real cheap per cpu counter implementation. numbers? most of percpu_counter_add() is only executed once per FBC_BATCH calls. At time I introduced it in network tree (locally, not meant to invade kernel land and makes you unhappy :) ), the goals were : Well maybe as a temporary networking-only thing OK, based upon performance-tested results. But I don't think the present code is suitable as part of the kernel-wide toolkit. Some counters (total sockets count) were a single integer, that were doing ping-pong between cpus (SMP/NUMA). As they are basically lazy values (as we dont really need to read their value), using plain atomic_t was overkill. Using a plain percpu_counters was expensive (NR_CPUS*(32+sizeof(void *)) instead of num_possible_cpus()*4). No, percpu_counters use alloc_percpu(), which is O(num_possible_cpus), not O(NR_CPUS). Using 'online' instead of 'possible' stuff is not really needed for a temporary thing. This was put in ./lib/! - We dont care of read sides. Well the present single caller in networking might not care. But this was put in ./lib/ and was exported to modules. That is an invitation to all kernel developers to use it in new code. Which may result in truly awful performance on high-cpu-count machines. We want really fast write side. Real fast. eh? It's called on a per-connection basis, not on a per-packet basis? Read side is when you do a cat /proc/net/sockstat. That is ... once in a while... For the current single caller. But it's in ./lib/. And there's always someone out there who does whatever we don't expect them to do. Now when we allocate a new socket, code to increment the socket count is : c03a74a8 tcp_pcounter_add: c03a74a8: b8 90 26 5f c0 mov$0xc05f2690,%eax c03a74ad: 64 8b 0d 10 f1 5e c0mov%fs:0xc05ef110,%ecx c03a74b4: 01 14 01add%edx,(%ecx,%eax,1) c03a74b7: c3 ret I can't find that code. I suspect that's the DEFINE_PER_CPU flavour, which isn't used anywhere afaict. Plus this omits the local_irq_save/restore (or preempt_disable/enable) and the indirect function call, which can be expensive. That is 4 instructions. I could be two in the future, thanks to current work on fs/gs based percpu variables. Current percpu_counters implementation is more expensive : c021467b __percpu_counter_add: c021467b: 55 push %ebp c021467c: 57 push %edi c021467d: 89 c7 mov%eax,%edi c021467f: 56 push %esi c0214680: 53 push %ebx c0214681: 83 ec 04sub$0x4,%esp c0214684: 8b 40 14mov0x14(%eax),%eax c0214687: 64 8b 1d 08 f0 5e c0mov%fs:0xc05ef008,%ebx c021468e: 8b 6c 24 18 mov0x18(%esp),%ebp c0214692: f7 d0 not%eax c0214694: 8b 1c 98mov(%eax,%ebx,4),%ebx c0214697: 89 1c 24mov%ebx,(%esp) c021469a: 8b 03 mov(%ebx),%eax c021469c: 89 c3 mov%eax,%ebx c021469e: 89 c6 mov%eax,%esi c02146a0: c1 fe 1fsar$0x1f,%esi c02146a3: 89 e8 mov%ebp,%eax c02146a5: 01 d3 add%edx,%ebx c02146a7: 11 ce adc%ecx,%esi c02146a9: 99 cltd c02146aa: 39 d6 cmp%edx,%esi c02146ac: 7f 15 jg c02146c3 __percpu_counter_add+0x48 c02146ae: 7c 04 jl c02146b4 One of the above two branches is taken ((FBC_BATCH-1)/FBC_BATCH)ths of the time. __percpu_counter_add+0x39 c02146b0: 39 eb cmp%ebp,%ebx c02146b2: 73 0f jaec02146c3 __percpu_counter_add+0x48 c02146b4: f7 dd neg%ebp c02146b6: 89 e8 mov%ebp,%eax c02146b8: 99 cltd c02146b9: 39 d6 cmp%edx,%esi c02146bb: 7f 20 jg c02146dd __percpu_counter_add+0x62 c02146bd: 7c 04 jl c02146c3 __percpu_counter_add+0x48 c02146bf: 39 eb cmp%ebp,%ebx c02146c1: 77 1a ja c02146dd __percpu_counter_add+0x62 c02146c3: 89 f8
Re: [PATCH 1/2] remove rcu_assign_pointer(NULL) penalty with type/macro safety
On Wed, 13 Feb 2008 14:00:24 -0800 Paul E. McKenney [EMAIL PROTECTED] wrote: Hello! This is an updated version of the patch posted last November: http://archives.free.net.ph/message/20071201.003721.cd6ff17c.en.html This new version permits arguments with side effects, for example: rcu_assign_pointer(global_p, p++); and also verifies that the arguments are pointers, while still avoiding the unnecessary memory barrier when assigning NULL to a pointer. This memory-barrier avoidance means that rcu_assign_pointer() is now only permitted for pointers (not array indexes), and so this version emits a compiler warning if the first argument is not a pointer. I built a make allyesconfig version on an x86 system, and received no such warnings. If RCU is ever applied to array indexes, then the second patch in this series should be applied, and the resulting rcu_assign_index() be used. Given the rather surprising history of subtlely broken implementations of rcu_assign_pointer(), I took the precaution of generating a full set of test cases and verified that memory barriers and compiler warnings were emitted when required. I guess it is the simple things that get you... Signed-off-by: Paul E. McKenney [EMAIL PROTECTED] --- rcupdate.h | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff -urpNa -X dontdiff linux-2.6.24/include/linux/rcupdate.h linux-2.6.24-rap/include/linux/rcupdate.h --- linux-2.6.24/include/linux/rcupdate.h 2008-01-24 14:58:37.0 -0800 +++ linux-2.6.24-rap/include/linux/rcupdate.h 2008-02-13 13:36:47.0 -0800 @@ -270,12 +270,20 @@ extern struct lockdep_map rcu_lock_map; * structure after the pointer assignment. More importantly, this * call documents which pointers will be dereferenced by RCU read-side * code. + * + * Throws a compiler warning for non-pointer arguments. + * + * Does not insert a memory barrier for a NULL pointer. */ -#define rcu_assign_pointer(p, v) ({ \ - smp_wmb(); \ - (p) = (v); \ - }) +#define rcu_assign_pointer(p, v) \ + ({ \ + typeof(*p) *_p1 = (v); \ + \ + if (!__builtin_constant_p(v) || (_p1 != NULL)) \ + smp_wmb(); \ + (p) = _p1; \ + }) umm... net/netfilter/core.c: In function 'nf_register_afinfo': net/netfilter/core.c:39: warning: initialization discards qualifiers from pointer target type net/netfilter/nf_log.c: In function 'nf_log_register': net/netfilter/nf_log.c:37: warning: initialization discards qualifiers from pointer target type net/netfilter/nf_queue.c: In function 'nf_register_queue_handler': net/netfilter/nf_queue.c:38: warning: initialization discards qualifiers from pointer target type -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: include/linux/pcounter.h
- First up, why was this added at all? We have percpu_counter.h which has several years development invested in it. afaict it would suit the present applications of pcounters. If some deficiency in percpu_counters has been identified, is it possible to correct that deficiency rather than implementing a whole new counter type? That would be much better. - The comments in pcounter.h appear to indicate that there is a performance advantage (and we infer that the advantage is when the statically-allocated flavour of pcounters is used). When compared with percpu_counters the number of data-reference indirections is the same as with percpu_counters, so no advantage there. And, bizarrely, because of a quite inappropriate abstraction toy, both flavours of pcounters add an indirect function call which I believe is significantly more expensive than a plain old pointer indirection. So it's quite possible that DEFINE_PCOUNTER-style counters consume more memory and are slower than percpu_counters. They will surely be much slower on the read side. More below. If we really want to put some helper wrappers around DEFINE_PER_CPU(s32) then I'd suggest that we should do that as a standalone thing and not attempt to graft the same interface onto two quite different types of storage (DEFINE_PER_CPU and alloc_per_cpu) - The comment 2) in pcounter.h (which overflows 80 cols and probably wasn't passed through checkpatch) indicates that some other implementation (presumably plain old DEFINE_PER_CPU) will use NR_CPUS*(32+sizeof(void *)) bytes of storage. But DEFINE_PCOUNTER will use as much memory as DEFINE_PER_CPU(s32) and both pcounter_alloc()-style pcounters and percpu_counters use num_possible_cpus()*sizeof(s32)+epsilon. - The CONFIG_SMP=n stubs in pcounter.h are cheesy and are vulnerable to several well-known compilation risks which I always forget. Should be converted to regular static inlines. - the comment in lib/pcounter.c needlessly exceeds 80 cols. - pcounter_dyn_add() will spew a use-of-smp_processor_id()-in-preemptible-code warning if used in places where one could reasonably use it. The interface could do with a bit of a rethink. Or at least some justification and documentation. - pcounter_getval() will be disastrously inefficient if num_possible_cpus() is much greater than num_online_cpus(). It should use for_each_online_cpu() (as does percpu_counter), and implement a CPU hotplug notifier (as does percpu_counter). It will remain grossly inefficient at high CPU counts, unlike percpu_counters, which solved this problem by doing a batched spill into a central counter at add/sub time. The danger here is that someone will actually use this interface in new code. Six months later (when it's too late to fix it) the big-NUMA guys come up and say whaa, when our user does this it disabled interrupts for N milliseconds. - pcounter_getval() can return incorrect negative numbers. This can cause caller malfunctions in very rare situations because callers just don't expect the things which they're counting to go negative. We experienced this during the evolution of percpu_counter. See percpu_counter_sum_positive() and friends. - pcounter_alloc() should return -ENOMEM on allocation error, not 1. - pcounter_free() perhaps shouldn't test for (self-per_cpu_values != NULL), because callers shouldn't be calling it if pcounter_alloc() failed (arguable). afaict the whole implementation can and should be removed and replaced with percpu_counters. I don't think there's much point in its ability to manage DEFINE_PER_CPU counters: pcounter_getval() remains grossly inefficient (and can return negative values) and quite a bit of new code will need to be put in place to address that. But perhaps there are plans to evolve it into something further in the future, I don't know. But I would suggest that the people who have worked upon percpu_counters (principally Gautham, Peter Z, clameter and me) be involved in that work. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9990] New: tg3: eth0: The system may be re-ordering memory-mapped I/O cycles
On Thu, 14 Feb 2008 01:59:12 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9990 Summary: tg3: eth0: The system may be re-ordering memory-mapped I/O cycles Product: Drivers Version: 2.5 KernelVersion: 2.6.24-git18 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Network AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Latest working kernel version: 2.6.24 Earliest failing kernel version: 2.6.24-git18 Distribution: Debian/testing Hardware Environment: Software Environment: Problem Description: Feb 11 13:11:52 www kernel: [ 12.015569] tg3: eth0: Link is up at 100 Mbps, full duplex. Feb 11 13:11:52 www kernel: [ 12.015633] tg3: eth0: Flow control is on for TX and on for RX. Feb 11 13:33:44 www kernel: [ 1328.538204] tg3: eth0: The system may be re-ordering memory-mapped I/O cycles to the network device, attempting to recover. Please report the problem to the driver maintainer and include system chipset information. Feb 11 13:33:44 www kernel: [ 1328.667255] tg3: eth0: Link is down. Feb 11 13:33:46 www kernel: [ 1330.560734] tg3: eth0: Link is up at 100 Mbps, full duplex. Feb 11 13:33:46 www kernel: [ 1330.560734] tg3: eth0: Flow control is on for TX and on for RX. After that, the machine rebooted (panic?) Feb 11 13:35:14 www kernel: klogd 1.5.0#1.1, log source = /proc/kmsg started. lspci -vvv info: 02:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 10) Subsystem: Compaq Computer Corporation NC7782 Gigabit Server Adapter (PCI-X, 10,100,1000-T) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- TAbort- MAbort- SERR- PERR- INTx- Latency: 64 (16000ns min), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 19 Region 0: Memory at fdf7 (64-bit, non-prefetchable) [size=64K] [virtual] Expansion ROM at 8814 [disabled] [size=64K] Capabilities: [40] PCI-X non-bridge device Command: DPERE- ERO- RBC=2048 OST=1 Status: Dev=02:02.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz- Capabilities: [48] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable+ DSel=0 DScale=1 PME- Capabilities: [50] Vital Product Data ? Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3 Enable- Address: fd7ffd6fdf7deeb8 Data: bdfd Kernel driver in use: tg3 Kernel modules: tg3 02:02.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 10) Subsystem: Compaq Computer Corporation NC7782 Gigabit Server Adapter (PCI-X, 10,100,1000-T) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- TAbort- MAbort- SERR- PERR- INTx- Latency: 64 (16000ns min), Cache Line Size: 64 bytes Interrupt: pin B routed to IRQ 20 Region 0: Memory at fdf6 (64-bit, non-prefetchable) [size=64K] Capabilities: [40] PCI-X non-bridge device Command: DPERE- ERO+ RBC=512 OST=1 Status: Dev=02:02.1 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz- Capabilities: [48] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] Vital Product Data ? Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3 Enable- Address: f73feeefe7f8 Data: 9bcd Kernel driver in use: tg3 Kernel modules: tg3 Steps to reproduce: -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: 2.6.25-rc1: iptables postrouting setup causes oops
On Tue, 12 Feb 2008 22:46:01 +1100 Ben Nizette [EMAIL PROTECTED] wrote: On an AVR32, root over NFS, config attached, running (from a startup script): iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE Results in (dmesg extract including a bit of context for good measure): -8 VFS: Mounted root (nfs filesystem). Freeing init memory: 72K (9000 - 90012000) eth0: no IPv6 routers present warning: `dnsmasq' uses 32-bit capabilities (legacy support in use) ip_tables: (C) 2000-2006 Netfilter Core Team nf_conntrack version 0.5.0 (1024 buckets, 4096 max) Unable to handle kernel paging request at virtual address d76a7138 ptbr = 91d3b000 pgd = e5f3 pte = 00014370 Oops: Kernel access of bad area, sig: 11 [#1] FRAME_POINTER chip: 0x01f:0x1e82 rev 2 Modules linked in: nf_conntrack_ipv4(+) nf_conntrack ip_tables PC is at kmem_cache_alloc+0x2c/0x54 LR is at nf_conntrack_l4proto_register+0x34/0x9c [nf_conntrack] I take it that the above means that the crash is in kmem_cache_alloc()? pc : [9004fa78]lr : [c08537d8]Not tainted sp : 91eb5e9c r12: 901cc588 r11: 00d0 r10: 901e03a0 r9 : 0002 r8 : 91d33800 r7 : 91eb5e9c r6 : 901d9138 r5 : 901cc588 r4 : 0007a008 r3 : 00d0 r2 : 0045 r1 : c085f9c0 r0 : c08c1424 Flags: qvNzc Mode bits: hjmdeG CPU Mode: Supervisor Process: insmod [250] (task: 91d789c0 thread: 91eb4000) Stack: (0x91eb5e9c to 0x91eb6000) 5e80:c08537d8 5ea0: 91eb5ec0 c085a6b4 0007a008 fff0 0027 c085f9c0 c08c1424 5ec0: c0861024 91eb5ee4 c085f654 0007a008 901d31b0 0027 c085f9c0 5ee0: c08c1424 900358d4 91eb5f94 91d6ddf0 901d31b8 0007a008 91eb5f08 0007a008 5f00: 5f20: 000c 0007 c08df440 91d1c660 c08c19ec 5f40: c08c16a4 c0881000 00b2 00b2 c085e17c 90022414 0027 5f60: c08c12c3 c08c1424 c08c1a3c c08c1a14 0027 0025 0006 2ab17000 5f80: 90047080 91eb5f94 91eb4000 c087ef80 90012132 00072e14 5fa0: 535e 0007a008 00072858 7f89ff73 8000 91eb4000 0001 2aae23ec 5fc0: 7f89fd98 7f89fd88 2ab17008 0005edf5 0007a008 0005edf5 0073 7f89fedc 5fe0: 00072e14 535e 0007a008 00072858 7f89ff73 0002 00059538 2ab17008 Call trace: [c08537d8] nf_conntrack_l4proto_register+0x34/0x9c [nf_conntrack] [c0861024] nf_conntrack_l3proto_ipv4_init+0x24/0x108 [nf_conntrack_ipv4] [900358d4] sys_init_module+0xf78/0x1028 [90012132] syscall_return+0x0/0x12 iptable_nat: gave up waiting for init of module nf_conntrack_ipv4. iptable_nat: Unknown symbol need_ipv4_conntrack 8- iptables version 1.3.8 If so, the bug could be almost anywhere - in slab, or in some random piece of code which scribbles on slab's data structures. Perfectly repeatable. If my theory is correct, changing pretty much anything in the kernel config might just make it go away. But still, it would be most valuable if you could try running a bisection search, as described in http://www.kernel.org/doc/local/git-quick.html, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: 2.6.25-rc1: iptables postrouting setup causes oops
On Wed, 13 Feb 2008 10:10:24 +0100 Haavard Skinnemoen [EMAIL PROTECTED] wrote: On Wed, 13 Feb 2008 00:48:29 -0800 Andrew Morton [EMAIL PROTECTED] wrote: On Tue, 12 Feb 2008 22:46:01 +1100 Ben Nizette [EMAIL PROTECTED] wrote: On an AVR32, root over NFS, config attached, running (from a startup script): iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE Results in (dmesg extract including a bit of context for good measure): -8 VFS: Mounted root (nfs filesystem). Freeing init memory: 72K (9000 - 90012000) eth0: no IPv6 routers present warning: `dnsmasq' uses 32-bit capabilities (legacy support in use) Hmm. What does that mean? What size do capabilities normally have? My near-namesake put than in, but I immediately forgot what it means? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] remove rcu_assign_pointer(NULL) penalty with type/macro safety
On Wed, 13 Feb 2008 14:00:24 -0800 Paul E. McKenney [EMAIL PROTECTED] wrote: Hello! This is an updated version of the patch posted last November: http://archives.free.net.ph/message/20071201.003721.cd6ff17c.en.html This new version permits arguments with side effects, for example: rcu_assign_pointer(global_p, p++); and also verifies that the arguments are pointers, while still avoiding the unnecessary memory barrier when assigning NULL to a pointer. This memory-barrier avoidance means that rcu_assign_pointer() is now only permitted for pointers (not array indexes), and so this version emits a compiler warning if the first argument is not a pointer. I built a make allyesconfig version on an x86 system, and received no such warnings. If RCU is ever applied to array indexes, then the second patch in this series should be applied, and the resulting rcu_assign_index() be used. Given the rather surprising history of subtlely broken implementations of rcu_assign_pointer(), I took the precaution of generating a full set of test cases and verified that memory barriers and compiler warnings were emitted when required. I guess it is the simple things that get you... Signed-off-by: Paul E. McKenney [EMAIL PROTECTED] --- rcupdate.h | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff -urpNa -X dontdiff linux-2.6.24/include/linux/rcupdate.h linux-2.6.24-rap/include/linux/rcupdate.h --- linux-2.6.24/include/linux/rcupdate.h 2008-01-24 14:58:37.0 -0800 +++ linux-2.6.24-rap/include/linux/rcupdate.h 2008-02-13 13:36:47.0 -0800 whoop, ancient kernel alert. @@ -270,12 +270,20 @@ extern struct lockdep_map rcu_lock_map; * structure after the pointer assignment. More importantly, this * call documents which pointers will be dereferenced by RCU read-side * code. + * + * Throws a compiler warning for non-pointer arguments. + * + * Does not insert a memory barrier for a NULL pointer. */ -#define rcu_assign_pointer(p, v) ({ \ - smp_wmb(); \ - (p) = (v); \ - }) +#define rcu_assign_pointer(p, v) \ + ({ \ + typeof(*p) *_p1 = (v); \ + \ + if (!__builtin_constant_p(v) || (_p1 != NULL)) \ + smp_wmb(); \ + (p) = _p1; \ + }) Someone already merged the dont-do-it-for-NULL patch so I reworked this appropriately. Was too lazy to update the changelog though. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] remove rcu_assign_pointer(NULL) penalty with type/macro safety
On Wed, 13 Feb 2008 15:37:44 -0800 Paul E. McKenney [EMAIL PROTECTED] wrote: Ah. It does take a bit to get fib_trie into one's build -- allyesconfig doesn't cut it. This is not good. The sole purpose of allmodconfig and allyesconfig is for compilation and linkage coverage testing. Hence we should aim to get as much code as possible included in those cases. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] remove rcu_assign_pointer(NULL) penalty with type/macro safety
On Wed, 13 Feb 2008 15:57:38 -0800 (PST) David Miller [EMAIL PROTECTED] wrote: From: Andrew Morton [EMAIL PROTECTED] Date: Wed, 13 Feb 2008 15:52:45 -0800 On Wed, 13 Feb 2008 15:37:44 -0800 Paul E. McKenney [EMAIL PROTECTED] wrote: Ah. It does take a bit to get fib_trie into one's build -- allyesconfig doesn't cut it. This is not good. The sole purpose of allmodconfig and allyesconfig is for compilation and linkage coverage testing. Hence we should aim to get as much code as possible included in those cases. Well, in this case there is a choice, either you use one routing lookup datastructure or the other. It's not purposefully being hidden from the everything builds :-) oic. yes, that is a bit of a problem. Oh well. `make randconfig' seems to be able to enable CONFIG_IP_FIB_TRIE about one time in eight ;) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] add rcu_assign_index() if ever needed
On Thu, 14 Feb 2008 09:02:09 +0530 Gautham R Shenoy [EMAIL PROTECTED] wrote: /** + * rcu_assign_index - assign (publicize) a index of a newly + * initialized array elementg that will be dereferenced by RCU I hope Andrew got that one while porting against the latest -mm :) I don't actually read the comments - I just like to make sure they're there ;) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9940] New: e1000e driver with 82566DM-2 controller doesn't strip crc from frames
On Tue, 12 Feb 2008 01:50:49 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9940 Summary: e1000e driver with 82566DM-2 controller doesn't strip crc from frames Product: Drivers Version: 2.5 KernelVersion: 2.6.24 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Network AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Latest working kernel version: n/a Earliest failing kernel version: 2.6.24 Distribution: Gentoo 2007.0 Hardware Environment: HP Compaq dc7800p Software Environment: n/a Problem Description: The e1000e driver doesn't strip the ethernet frame crc is used with the 82566DM-2 chip. This will result, among other things, that bridging doesn't work together with this chip. Reverting commit 140a74802894e9db57e5cd77ccff77e590ece5f3 fixes this issue but has performance implications. Steps to reproduce: Send a ping and monitor the responce with tcpdump. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] fib_trie: improve output format for /proc/net/fib_trie
On Tue, 12 Feb 2008 16:50:44 -0800 Stephen Hemminger [EMAIL PROTECTED] wrote: Make output format prettier (more tree like). local: --- 0.0.0.0/0 |--- 10.111.111.0/24 | +-- 10.111.111.0/32 link broadcast | |--- 10.111.111.254/31 | | +-- 10.111.111.254/32 host local | | +-- 10.111.111.255/32 link broadcast |--- 127.0.0.0/8 | |--- 127.0.0.0/31 | | +-- 127.0.0.0/32 link broadcast | | +-- 127.0.0.0/8 host local | | +-- 127.0.0.1/32 host local | +-- 127.255.255.255/32 link broadcast |--- 192.168.1.0/24 | |--- 192.168.1.0/28 | | +-- 192.168.1.0/32 link broadcast | | +-- 192.168.1.9/32 host local | +-- 192.168.1.255/32 link broadcast main: --- 0.0.0.0/0 |--- 0.0.0.0/4 | +-- 0.0.0.0/0 universe unicast | +-- 10.111.111.0/24 link unicast +-- 169.254.0.0/16 link unicast +-- 192.168.1.0/24 link unicast isn't that a non-back-compatible kernel ABI change? It might break pre-existing parsers? aside: how lame are we to put pretty-printers in the kernel? English-only ones, at that? Root cause: kernel developers still don't have a sufficiently easy way of shipping userspace tools. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Execute tasklets in the same order they were queued
On Mon, 11 Feb 2008 16:28:13 -0600 Olof Johansson [EMAIL PROTECTED] wrote: I noticed this when looking at an openswan issue. Openswan (ab?)uses the tasklet API to defer processing of packets in some situations, with one packet per tasklet_action(). I started noticing sequences of reverse-ordered sequence numbers coming over the wire, since new tasklets are always queued at the head of the list but processed sequentially. Convert it to instead append new entries to the tail of the list. As an extra bonus, the splicing code in takeover_tasklets() no longer has to iterate over the list. hm, I'd have thought that this would already have caused problems in networking. And perhaps this change might have effects on networking too? Probably it won't have _much_ effect on networking because networking probably isn't queueing one tasklet per packet(!) but perhaps with bonded channels or something like that? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9937] New: Bug in bonding driver - Kernel oops whenever driver is loaded with max_bonds parameter
On Mon, 11 Feb 2008 15:04:03 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9937 Summary: Bug in bonding driver - Kernel oops whenever driver is loaded with max_bonds parameter Product: Networking Version: 2.5 KernelVersion: 2.6.24.2 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: IPV4 AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Latest working kernel version: Earliest failing kernel version: 2.6.24.2 Distribution: Slackware / Debian GNU/Linux Hardware Environment: HP ProLiant DL380 G5 (Debian), Slackware Acer TravelMate 4001 Laptop Software Environment: Problem Description: Kernel oops whenever bonding driver with max_bonds=2 (or 2) is loaded ... Steps to reproduce: modprobe bonding mode=0 miimon=100 max_bonds=2 or modprobe bonding max_bonds=2 dmesg output (from slackware laptop / x86): BUG: unable to handle kernel NULL pointer dereference at virtual address printing eip: c028eeaf *pde = Oops: [#1] SMP Modules linked in: bonding snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ntfs pcmcia yenta_socket rsrc_nonstatic tifm_7xx1 tifm_core pcmcia_core snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm i2c_i801 snd_timer snd i2c_core shpchp snd_page_alloc ehci_hcd uhci_hcd pci_hotplug Pid: 2729, comm: modprobe Not tainted (2.6.24.2 #2) EIP: 0060:[c028eeaf] EFLAGS: 00010282 CPU: 0 EIP is at strnicmp+0x17/0x61 EAX: d8162800 EBX: ECX: 0010 EDX: 0062 ESI: 0010 EDI: EBP: d8162801 ESP: d82c9f60 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process modprobe (pid: 2729, ti=d82c8000 task=df926550 task.ti=d82c8000) Stack: d8162c80 e0c76814 e0c67170 0001 df80b700 e0c77180 0001 000c d82c8000 e0afe05e e0c6ed14 e0c6ce70 e0c76c00 0805c098 000c c014e355 b7e7a008 0805c098 c0106f12 b7e7a008 00019477 Call Trace: [e0c67170] bond_create+0x4a/0x162 [bonding] [e0afe05e] bonding_init+0x5e/0xf0 [bonding] [c014e355] sys_init_module+0x91/0x11b [c0106f12] syscall_call+0x7/0xb [c047] sctp_setsockopt_bindx+0xe8/0x127 === Code: 08 fe dc ba 98 c7 40 0c 76 54 32 10 c7 40 10 f0 e1 d2 c3 c3 55 89 c5 57 89 d7 31 d2 56 89 ce 53 31 db 85 c9 74 42 0f b6 55 00 45 0f b6 1f 47 84 d2 74 35 84 db 74 31 38 da 74 2a 0f b6 c2 88 d1 EIP: [c028eeaf] strnicmp+0x17/0x61 SS:ESP 0068:d82c9f60 ---[ end trace 75761717808bf4ee ]--- dmesg output (from Debian x86_64 - HP ProLiant DL380): Unable to handle kernel NULL pointer dereference at RIP: [8030271e] strnicmp+0x12/0x5f PGD 223005067 PUD 223b22067 PMD 0 Oops: [1] SMP CPU 7 Modules linked in: bonding mptctl mptbase fan ac battery ipv6 dm_snapshot dm_mirror dm_mod loop usbhid ide_cd cdrom bnx2 generic thermal ipmi_si piix serio_raw evdev shpchp psmouse pci_hotplug container pcspkr ide_core ipmi_msghandler uhci_hcd button processor ehci_hcd e1000 ext3 jbd mbcache reiserfs cciss Pid: 12469, comm: modprobe Not tainted 2.6.24.2 #1 RIP: 0010:[8030271e] [8030271e] strnicmp+0x12/0x5f RSP: 0018:81022339fe00 EFLAGS: 00010202 RAX: 81022307e6c0 RBX: 88233918 RCX: 20e7 RDX: 0010 RSI: RDI: 81022307e000 RBP: R08: 810223b90362 R09: 0010 R10: 8822d60b R11: 0001 R12: R13: 88234b00 R14: 81022307e7c8 R15: FS: 2b07aa3166e0() GS:81022743bd00() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: CR3: 00022339c000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process modprobe (pid: 12469, threadinfo 81022339e000, task 8102239aa000) Stack: 882200ce 8102239ad000 0001 8102274273c0 0001 c20011bef960 810225c88540 8809f7bf 882340c0 882340c0 8102263f7f00 Call Trace: [882200ce] :bonding:bond_create+0x4e/0x30e [8809f7bf] :bonding:bonding_init+0x7bf/0x85d [8024f752] sys_init_module+0x176d/0x183f [8020be8e] system_call+0x7e/0x83 Code: 8a 0e 48 ff c7 48 ff c6 45 84 c0 74 36 84 c9 74 32 41 38 c8 RIP [8030271e] strnicmp+0x12/0x5f RSP 81022339fe00 CR2: ---[ end trace ba3d7089e7da64fa ]--- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at
Re: [Bugme-new] [Bug 9933] New: kernel BUG at include/linux/skbuff.h:912
On Mon, 11 Feb 2008 03:46:45 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9933 Summary: kernel BUG at include/linux/skbuff.h:912 Product: Networking Version: 2.5 KernelVersion: 2.6.24.2 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Netfilter/Iptables AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Latest working kernel version: 2.6.22.3 Earliest failing kernel version: 2.6.24.1 Distribution: Debian etch Hardware Environment: x86_64, SMP If libnetfilter-queue (v. 0.0.12-1) application calls nfq_set_verdict and: - protocol is IPv4 (works fine with IPv6) - new packet length has been changed - packet contains data payload (not affected if tcp header is extended with options, but data payload=0) SKB_LINEAR_ASSERT is catched. [ cut here ] kernel BUG at include/linux/skbuff.h:912! invalid opcode: [1] SMP CPU 4 Modules linked in: nfnetlink_queue nfnetlink ip6table_mangle xt_NFQUEUE iptable_mangle xt_tcpudp nf_conntrack_ipv6 nf_conntrack_ipv4 xt_state nf_conntrack iptable_filter ip_tables ip6table_filter ip6_tables x_tables esp4 ah4 xfrm4_mode_transport deflate zlib_deflate twofish twofish_common camellia serpent blowfish des_generic cbc ecb blkcipher aes_x86_64 aes_generic xcbc sha256_generic sha1_generic crypto_null af_key dm_crypt dm_snapshot dm_mirror dm_mod ipv6 ipmi_si iTCO_wdt container ipmi_msghandler button serio_raw evdev pcspkr ide_generic ide_cd cdrom pata_acpi ata_generic ata_piix libata scsi_mod usbhid piix generic ide_core ehci_hcd bnx2 uhci_hcd zlib_inflate cciss thermal processor fan Pid: 3390, comm: tcpmd5 Not tainted 2.6.24.2 #1 RIP: 0010:[88258b2c] [88258b2c] :nfnetlink_queue:nfqnl_recv_verdict+0x179/0x227 RSP: 0018:81012d219a08 EFLAGS: 00010206 RAX: 0100 RBX: RCX: 00010001 RDX: 81012e539500 RSI: 81012e539638 RDI: 81012df7ce18 RBP: 0075 R08: 88250079 R09: 81012df7ce18 R10: 7fff576df198 R11: 81012d9daac0 R12: 0014 R13: 81012e691e40 R14: 0001 R15: 81012eae3c20 FS: 2aab53c7a6d0() GS:81012f8fdb40() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 2aab535e6090 CR3: 00012e06c000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process tcpmd5 (pid: 3390, threadinfo 81012d218000, task 81012dafc080) Stack: 8825000a 81012d219a60 81012e524740 81012d219a60 81012d219af8 88258ff8 81012eae3c00 81012d219ac8 81012f9d7a80 88255233 804e42e8 Call Trace: [88255233] :nfnetlink:nfnetlink_rcv_msg+0x129/0x172 [8825512b] :nfnetlink:nfnetlink_rcv_msg+0x21/0x172 [8825510a] :nfnetlink:nfnetlink_rcv_msg+0x0/0x172 [803d23b6] netlink_rcv_skb+0x34/0x8b [8825501f] :nfnetlink:nfnetlink_rcv+0x1f/0x2c [803d2156] netlink_unicast+0x1e0/0x240 [803d29eb] netlink_sendmsg+0x2a2/0x2b5 [803ba345] memcpy_fromiovec+0x36/0x66 [803b3790] sock_sendmsg+0xe2/0xff [80243e10] autoremove_wake_function+0x0/0x2e [803b3790] sock_sendmsg+0xe2/0xff [80243e10] autoremove_wake_function+0x0/0x2e [80312f8d] xfs_vn_getattr+0x3d/0xfd [803b29c0] move_addr_to_kernel+0x25/0x36 [803b39c1] sys_sendmsg+0x214/0x287 [803b3b5c] sys_sendto+0x128/0x151 [8027bbf5] do_readv_writev+0x18f/0x1a4 [8020be2e] system_call+0x7e/0x83 Code: 0f 0b eb fe 44 01 e0 44 01 67 68 3b 87 b8 00 00 00 89 87 b4 RIP [88258b2c] :nfnetlink_queue:nfqnl_recv_verdict+0x179/0x227 RSP 81012d219a08 ---[ end trace 303d8add98149551 ]--- I cannot reproduce problem on kernel 2.6.22.3 (both i386 and x86-64) and 2.6.24.2 if arch is i386. tcpmd5 application http://tcpmd5.googlecode.com/files/tcpmd5_0.0.3.tar.gz -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9923] New: Unable conect to internet over ISDN AVM USB BlueFritz
On Sat, 9 Feb 2008 03:36:26 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9923 Summary: Unable conect to internet over ISDN AVM USB BlueFritz Product: Networking Version: 2.5 KernelVersion: 2.6.24 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Other AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Latest working kernel version:2.6.22 Earliest failing kernel version:2.24 A regression. Distribution:opensuse 10.3 Hardware Environment:AVM BlueFritz USB ISDN Software Environment: Problem Description: ifup ippp0 ippp0 SIOCSIFFLAGS: Invalid argument Cannot enable interface ippp0 Steps to reproduce: Feb 9 11:20:54 linux-zhr0 klogd: ippp0: dialing 1 019193383... Feb 9 11:20:56 linux-zhr0 klogd: isdn_net: ippp0 connected Feb 9 11:20:56 linux-zhr0 klogd: capidrv-1: chan 0 up with ncci 0x10101 Feb 9 11:20:56 linux-zhr0 ipppd[5624]: Local number: 906138, Remote number: 019193383, Type: outgoing Feb 9 11:20:56 linux-zhr0 ipppd[5624]: PHASE_WAIT - PHASE_ESTABLISHED, ifunit: 0, linkunit: 0, fd: 7 Feb 9 11:20:56 linux-zhr0 ipppd[5624]: ioctl(SIOCSIFMTU): Invalid argument, 6 ippp0 1524. Feb 9 11:20:56 linux-zhr0 ipppd[5624]: MPPP negotiation, He: Yes We: Yes Feb 9 11:20:56 linux-zhr0 ipppd[5624]: CCP enabled! Trying CCP. Feb 9 11:20:56 linux-zhr0 ipppd[5624]: CCP: got ccp-unit 0 for link 0 (Compression Control Protocol) Feb 9 11:20:56 linux-zhr0 ipppd[5624]: ccp_resetci! Feb 9 11:20:56 linux-zhr0 ipppd[5624]: ccp_resetci! Feb 9 11:20:58 linux-zhr0 klogd: Received CCP frame from peer slot(0) Feb 9 11:20:58 linux-zhr0 klogd: [0/0].ccp-rcv[0]: 01 01 00 09 11 05 00 01 04 Feb 9 11:20:58 linux-zhr0 klogd: Received CCP frame from daemon: Feb 9 11:20:58 linux-zhr0 klogd: [0/0].ccp-xmit[0]: ff 03 80 fd 01 01 00 04 Feb 9 11:20:58 linux-zhr0 klogd: Received CCP frame from daemon: Feb 9 11:20:58 linux-zhr0 klogd: [0/0].ccp-xmit[0]: ff 03 80 fd 04 01 00 09 11 05 00 01 04 Feb 9 11:20:58 linux-zhr0 klogd: Received CCP frame from peer slot(0) Feb 9 11:20:58 linux-zhr0 klogd: [0/0].ccp-rcv[0]: 04 01 00 04 Feb 9 11:20:58 linux-zhr0 ipppd[5624]: ccp_resetci! Feb 9 11:20:58 linux-zhr0 klogd: Received CCP frame from peer slot(0) Feb 9 11:20:58 linux-zhr0 klogd: [0/0].ccp-rcv[0]: 01 02 00 0a 11 06 00 01 01 03 Feb 9 11:20:58 linux-zhr0 klogd: Received CCP frame from daemon: Feb 9 11:20:58 linux-zhr0 klogd: [0/0].ccp-xmit[0]: ff 03 80 fd 01 02 00 04 Feb 9 11:20:58 linux-zhr0 klogd: Received CCP frame from daemon: Feb 9 11:20:58 linux-zhr0 klogd: [0/0].ccp-xmit[0]: ff 03 80 fd 04 02 00 0a 11 06 00 01 01 03 Feb 9 11:20:58 linux-zhr0 klogd: Received CCP frame from peer slot(0) Feb 9 11:20:58 linux-zhr0 klogd: [0/0].ccp-rcv[0]: 04 02 00 04 Feb 9 11:20:59 linux-zhr0 ipppd[5624]: local IP address 149.225.68.4 Feb 9 11:20:59 linux-zhr0 ipppd[5624]: remote IP address 139.4.250.4 Feb 9 11:20:59 linux-zhr0 ipppd[5624]: Ifup: ioctl(SIOCSIFFLAGS): Invalid argument Feb 9 11:20:59 linux-zhr0 ip-down: SIOCSIFFLAGS: Invalid argument -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/7] typhoon section fix
On Fri, 08 Feb 2008 08:59:10 -0500 David Dillow [EMAIL PROTECTED] wrote: On Fri, 2008-02-08 at 03:11 -0800, [EMAIL PROTECTED] wrote: From: Andrew Morton [EMAIL PROTECTED] gcc-3.4.4 on powerpc: drivers/net/typhoon.c:137: error: version causes a section type conflict Cc: Jeff Garzik [EMAIL PROTECTED] Cc: Sam Ravnborg [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] -static const char version[] __devinitdata = +static char version[] __devinitdata = typhoon.c: version DRV_MODULE_VERSION ( DRV_MODULE_RELDATE )\n; Wouldn't going to __devinitconst be better? I'll try to whip up a patch and test-compile it. Sam told me that doesn't work right, that this approach is the one to use and, iirc, that __devinitcont and friends will be removed. I'm not sure why, actually. I think I missed that dicussion. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9920] New: kernel panic when using ebtables redirect target
On Fri, 8 Feb 2008 17:40:20 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9920 Summary: kernel panic when using ebtables redirect target Product: Networking Version: 2.5 KernelVersion: 2.6.24 and 2.6.24-git Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Other AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Latest working kernel version: 2.6.22 ( did not test 2.6.23 ) Earliest failing kernel version: 2.6.24 Distribution: Hardware Environment: Software Environment: bridge working as a router Problem Description: when using ebtables to set up target-redirect, there will be kernel panic Steps to reproduce: 1. set up a basic bridge br0 with slaves eth0, eth1 2. on the bridge setup a default router to route traffic 3. use ebtables to setup target redirect, ebtables -t broute -A BROUTING --logical-in br0 \ -p ipv4 --ip-protocol tcp --ip-destination-port 80 \ -j redirect --redirect-target ACCEPT 4. from a client which is connect to the bridge, send some traffic to allow the BROUTE chain to be traversed :- lynx http://www.google.com 5. Kernel panic :- Pid: 0, comm: swapper Not tainted (2.6.24-tmc #1) EIP: 0060:[c69f61aa] EFLAGS: 0217 CPU: 0 EIP is at ebt_do_table+0x4ea/0x5d0 [ebtables] EAX: EBX: ECX: EDX: 0001 ESI: c69f1178 EDI: c69f1108 EBP: c69f1000 ESP: c0315e20 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 Process swapper (pid: 0, ti=c0314000 task=c02f1300 task.ti=c0314000) Stack: c69f11dc 0004 c28c7800 c2b79c20 0005 c69de350 0001 0002 c69ed040 c69ed040 c69f1000 00b0 00b0 c29b0812 c69f1122 a0c3 c29b0812 Call Trace: [c69de032] ebt_broute+0x22/0x30 [ebtable_broute] [c69fef48] br_handle_frame+0xb8/0x220 [bridge] [c02274ac] netif_receive_skb+0x19c/0x440 [c0229ffb] process_backlog+0x6b/0xd0 [c0229a45] net_rx_action+0x105/0x1b0 [c011f835] __do_softirq+0x75/0xf0 [c011f8e7] do_softirq+0x37/0x40 [c011fb25] irq_exit+0x75/0x80 [c010d877] smp_apic_timer_interrupt+0x57/0x90 [c0105b34] apic_timer_interrupt+0x28/0x30 [c0103cd0] default_idle+0x0/0x40 [c0103cff] default_idle+0x2f/0x40 [c0103443] cpu_idle+0x73/0xa0 [c0319cd5] start_kernel+0x2c5/0x340 [c0319420] unknown_bootoption+0x0/0x1e0 === Code: 00 00 83 f9 fe 74 64 83 f9 fc 0f 84 d7 fb ff ff 83 f9 fd 0f 84 bb fc ff ff 8b 5c 24 30 8b 54 24 34 8d 04 5b 8d 04 82 8b 54 24 20 89 28 42 89 50 08 8b 5f 6c 01 df 89 78 04 8b 6c 24 38 8b 54 24 EIP: [c69f61aa] ebt_do_table+0x4ea/0x5d0 [ebtables] SS:ESP 0068:c0315e20 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] net: sh_eth: Add support for Renesas SuperH Ethernet
On Thu, 07 Feb 2008 17:39:23 +0900 Yoshihiro Shimoda [EMAIL PROTECTED] wrote: Add support for Renesas SuperH Ethernet controller. This driver supported SH7710 and SH7712. Nice looking driver. Quick comments: ... +/* + * Program the hardware MAC address from dev-dev_addr. + */ +static void __init update_mac_address(struct net_device *ndev) +{ + u32 ioaddr = ndev-base_addr; + + ctrl_outl((ndev-dev_addr[0] 24) | (ndev-dev_addr[1] 16) | + (ndev-dev_addr[2] 8) | (ndev-dev_addr[3]), + ioaddr + MAHR); + ctrl_outl((ndev-dev_addr[4] 8) | (ndev-dev_addr[5]), + ioaddr + MALR); +} + +/* + * Get MAC address from SuperH MAC address register + * + * SuperH's Ethernet device doesn't have 'ROM' to MAC address. + * This driver get MAC address that use by bootloader(U-boot or sh-ipl+g). + * When you want use this device, you must set MAC address in bootloader. + * + */ +static void __init read_mac_address(struct net_device *ndev) +{ + u32 ioaddr = ndev-base_addr; + + ndev-dev_addr[0] = (ctrl_inl(ioaddr + MAHR) 24); + ndev-dev_addr[1] = (ctrl_inl(ioaddr + MAHR) 16) 0xFF; + ndev-dev_addr[2] = (ctrl_inl(ioaddr + MAHR) 8) 0xFF; + ndev-dev_addr[3] = (ctrl_inl(ioaddr + MAHR) 0xFF); + ndev-dev_addr[4] = (ctrl_inl(ioaddr + MALR) 8) 0xFF; + ndev-dev_addr[5] = (ctrl_inl(ioaddr + MALR) 0xFF); +} Both the above functions are called from non-__init code and hence cannot be __init. sh_eth_tsu_init() is wrong too. Please check all section annotations in the driver. +struct bb_info { + struct mdiobb_ctrl ctrl; + u32 addr; + u32 mmd_msk;/* MMD */ + u32 mdo_msk; + u32 mdi_msk; + u32 mdc_msk; +}; Please cc David Brownell on updates to this driver - perhaps he will find time to review the bit-banging interface usage. +/* PHY bit set */ +static void bb_set(u32 addr, u32 msk) +{ + ctrl_outl(ctrl_inl(addr) | msk, addr); +} + +/* PHY bit clear */ +static void bb_clr(u32 addr, u32 msk) +{ + ctrl_outl((ctrl_inl(addr) ~msk), addr); +} + +/* PHY bit read */ +static int bb_read(u32 addr, u32 msk) +{ + return (ctrl_inl(addr) msk) != 0; +} + +/* Data I/O pin control */ +static inline void sh__mmd_ctrl(struct mdiobb_ctrl *ctrl, int bit) +{ + struct bb_info *bitbang = container_of(ctrl, struct bb_info, ctrl); + if (bit) + bb_set(bitbang-addr, bitbang-mmd_msk); + else + bb_clr(bitbang-addr, bitbang-mmd_msk); +} + +/* Set bit data*/ +static inline void sh__set_mdio(struct mdiobb_ctrl *ctrl, int bit) +{ + struct bb_info *bitbang = container_of(ctrl, struct bb_info, ctrl); + + if (bit) + bb_set(bitbang-addr, bitbang-mdo_msk); + else + bb_clr(bitbang-addr, bitbang-mdo_msk); +} + +/* Get bit data*/ +static inline int sh__get_mdio(struct mdiobb_ctrl *ctrl) +{ + struct bb_info *bitbang = container_of(ctrl, struct bb_info, ctrl); + return bb_read(bitbang-addr, bitbang-mdi_msk); +} There seems to be a fairly random mixture of inline and non-inline here. I'd suggest that you just remove all the `inline's. The compiler does a pretty good job of working doing this for you. +/* MDC pin control */ +static inline void sh__mdc_ctrl(struct mdiobb_ctrl *ctrl, int bit) +{ + struct bb_info *bitbang = container_of(ctrl, struct bb_info, ctrl); + + if (bit) + bb_set(bitbang-addr, bitbang-mdc_msk); + else + bb_clr(bitbang-addr, bitbang-mdc_msk); +} + +/* mdio bus control struct */ +static struct mdiobb_ops bb_ops = { + .owner = THIS_MODULE, + .set_mdc = sh__mdc_ctrl, + .set_mdio_dir = sh__mmd_ctrl, + .set_mdio_data = sh__set_mdio, + .get_mdio_data = sh__get_mdio, +}; It's particularly inappropriate that sh__mdc_ctrl() was inlined - it is only ever called via a function pointer and hence will never be inlined! ... +static void sh_eth_timer(unsigned long data) +{ + struct net_device *ndev = (struct net_device *)data; + struct sh_eth_private *mdp = netdev_priv(ndev); + int next_tick = 10 * HZ; + + /* We could do something here... nah. */ + mdp-timer.expires = jiffies + next_tick; + add_timer(mdp-timer); mod_timer() would be neater here. +} ... + +/* network device open function */ +static int sh_eth_open(struct net_device *ndev) +{ + int ret = 0; + struct sh_eth_private *mdp = netdev_priv(ndev); + + ret = request_irq(ndev-irq, sh_eth_interrupt, 0, ndev-name, ndev); + if (ret) { + printk(KERN_ERR Can not assign IRQ number to %s\n, CARDNAME); + return ret; + } + + /* Descriptor set */ + ret = sh_eth_ring_init(ndev); + if (ret) + goto out_free_irq; + + /* device init */ + ret = sh_eth_dev_init(ndev); + if (ret) + goto out_free_irq; + + /*
Re: [Bugme-new] [Bug 9914] New: bnx2 driver of latest kernel 2.6.24 not working with Cisco catalyst 650x Switch
On Thu, 7 Feb 2008 23:06:55 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9914 Summary: bnx2 driver of latest kernel 2.6.24 not working with Cisco catalyst 650x Switch Product: Drivers Version: 2.5 KernelVersion: 2.6.24 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Network AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Latest working kernel version: (Works with Redhat 4 kernel) Earliest failing kernel version: 2.6.24 Distribution: Self, kernel.org's kernel version 2.6.24 Hardware Environment: Dell 2950 Software Environment: Linux from Scratch, kernel version 2.6.24 Problem Description: I am using the latest kernel 2.6.24. The hardware unit is a 2950 Dell box. The firmware version of bnx2 is 2.9.1. The bnx2 driver does work with other flavors of Cisco Switches like 290x. It is having problems at a customer site who has a Cisco 650x. The customer tried the RHEL 4 and does seem to work fine. The RHEL4 comes with the version of bnx2: 1.4.38 The version on 2.6.24 is 1.6.9. Can anyone give me pointers as to why this is broken? Where can I get a patch? Any help is appreciated. Steps to reproduce: The driver does not auto-negotiate, neither does it work with a fixed link speed. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: locking api self-test hanging
On Mon, 4 Feb 2008 04:43:04 -0800 Andrew Morton [EMAIL PROTECTED] wrote: On Sun, 3 Feb 2008 15:07:44 -0800 Andrew Morton [EMAIL PROTECTED] wrote: On Sun, 3 Feb 2008 15:02:46 -0800 Andrew Morton [EMAIL PROTECTED] wrote: With current mainline I'm getting intermittent hangs here: http://userweb.kernel.org/~akpm/p2033590.jpg with this config: http://userweb.kernel.org/~akpm/config-sony.txt on the Vaio. Sometimes it boots (then hits another different hang), sometimes it gets stuck there. CONFIG_DEBUG_LOCKING_API_SELFTESTS=n fixed that up. The second hang is in kobject_uevent_init(). All that function does is call netlink_kernel_create(). And I've fully bisected this hang twice and both times came up with commit 33f807ba0d9259e7c75c7a2ce8bd2787e5b540c7 Author: Stephen Hemminger [EMAIL PROTECTED] Date: Mon Nov 19 19:24:52 2007 -0800 [NETPOLL]: Kill NETPOLL_RX_DROP, set but never tested. which is stupid because that patch doesn't do anything. However I am using netconsole-over-e100 and the hang does go away when I disable netconsole on the kernel boot command line. I'd say it's some timing thing in netpoll/netconsole/napi/etc. I'm seeing this netconsole hang on a second machine now. Current mainline. It also has e100. This one is SMP ancient PIII. Config: http://userweb.kernel.org/~akpm/config-vmm.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: locking api self-test hanging
On Wed, 6 Feb 2008 00:34:12 -0800 Andrew Morton [EMAIL PROTECTED] wrote: On Mon, 4 Feb 2008 04:43:04 -0800 Andrew Morton [EMAIL PROTECTED] wrote: On Sun, 3 Feb 2008 15:07:44 -0800 Andrew Morton [EMAIL PROTECTED] wrote: On Sun, 3 Feb 2008 15:02:46 -0800 Andrew Morton [EMAIL PROTECTED] wrote: With current mainline I'm getting intermittent hangs here: http://userweb.kernel.org/~akpm/p2033590.jpg with this config: http://userweb.kernel.org/~akpm/config-sony.txt on the Vaio. Sometimes it boots (then hits another different hang), sometimes it gets stuck there. CONFIG_DEBUG_LOCKING_API_SELFTESTS=n fixed that up. The second hang is in kobject_uevent_init(). All that function does is call netlink_kernel_create(). And I've fully bisected this hang twice and both times came up with commit 33f807ba0d9259e7c75c7a2ce8bd2787e5b540c7 Author: Stephen Hemminger [EMAIL PROTECTED] Date: Mon Nov 19 19:24:52 2007 -0800 [NETPOLL]: Kill NETPOLL_RX_DROP, set but never tested. which is stupid because that patch doesn't do anything. However I am using netconsole-over-e100 and the hang does go away when I disable netconsole on the kernel boot command line. I'd say it's some timing thing in netpoll/netconsole/napi/etc. I'm seeing this netconsole hang on a second machine now. Current mainline. It also has e100. This one is SMP ancient PIII. Config: http://userweb.kernel.org/~akpm/config-vmm.txt I can reproduce this on a third machine: the t61p laptop: dual x86_64 with e1000. It seems to need quite a lot of printk activity to make it happen. Turning on initcall_debug is a suitable way of triggering it. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/4] forcedeth: fix MAC address detection on network card (regression in 2.6.23)
On Tue, 05 Feb 2008 13:20:59 -0500 Jeff Garzik [EMAIL PROTECTED] wrote: Signed-off-by: Andrew Morton [EMAIL PROTECTED] NAK - this fixes one set of users, and breaks a working set of users. Need to add DMI check for the specific motherboard (dmi_check_system), and flip flag according to success/failure of that check. OK :) I added the above to the changelog for next time. You guys can hide, but this patch isn't going away! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
include/linux/pcounter.h
e7d0362dd41e760f340c1b500646cc92522bd9d5 should have been folded into de4d1db369785c29d68915edfee0cb70e8199f4c prior to merging. We now and for ever have a window of breakage which screws up git bisection. Which I just hit. Which is the only reason I discovered the file's existence. Please do not merge pieces of generic kernel infrastructure while keeping it all secret on the netdev list. Ever. That code has a number of deficiencies which I and probably others would have noted had it been offered to us for review. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: locking api self-test hanging
On Mon, 4 Feb 2008 04:43:04 -0800 Andrew Morton [EMAIL PROTECTED] wrote: On Sun, 3 Feb 2008 15:07:44 -0800 Andrew Morton [EMAIL PROTECTED] wrote: On Sun, 3 Feb 2008 15:02:46 -0800 Andrew Morton [EMAIL PROTECTED] wrote: With current mainline I'm getting intermittent hangs here: http://userweb.kernel.org/~akpm/p2033590.jpg with this config: http://userweb.kernel.org/~akpm/config-sony.txt on the Vaio. Sometimes it boots (then hits another different hang), sometimes it gets stuck there. CONFIG_DEBUG_LOCKING_API_SELFTESTS=n fixed that up. The second hang is in kobject_uevent_init(). All that function does is call netlink_kernel_create(). And I've fully bisected this hang twice and both times came up with commit 33f807ba0d9259e7c75c7a2ce8bd2787e5b540c7 Author: Stephen Hemminger [EMAIL PROTECTED] Date: Mon Nov 19 19:24:52 2007 -0800 [NETPOLL]: Kill NETPOLL_RX_DROP, set but never tested. which is stupid because that patch doesn't do anything. However I am using netconsole-over-e100 and the hang does go away when I disable netconsole on the kernel boot command line. I'd say it's some timing thing in netpoll/netconsole/napi/etc. I expect the locking API self-tests are innocent - the netconsole lockup happened to strike during the locking API tests and fooled me. When I disabled those tests, the netconsole lockup struck later on. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: locking api self-test hanging
On Sun, 3 Feb 2008 15:07:44 -0800 Andrew Morton [EMAIL PROTECTED] wrote: On Sun, 3 Feb 2008 15:02:46 -0800 Andrew Morton [EMAIL PROTECTED] wrote: With current mainline I'm getting intermittent hangs here: http://userweb.kernel.org/~akpm/p2033590.jpg with this config: http://userweb.kernel.org/~akpm/config-sony.txt on the Vaio. Sometimes it boots (then hits another different hang), sometimes it gets stuck there. CONFIG_DEBUG_LOCKING_API_SELFTESTS=n fixed that up. The second hang is in kobject_uevent_init(). All that function does is call netlink_kernel_create(). And I've fully bisected this hang twice and both times came up with commit 33f807ba0d9259e7c75c7a2ce8bd2787e5b540c7 Author: Stephen Hemminger [EMAIL PROTECTED] Date: Mon Nov 19 19:24:52 2007 -0800 [NETPOLL]: Kill NETPOLL_RX_DROP, set but never tested. which is stupid because that patch doesn't do anything. However I am using netconsole-over-e100 and the hang does go away when I disable netconsole on the kernel boot command line. I'd say it's some timing thing in netpoll/netconsole/napi/etc. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.24-mm1] error compiling net driver NE2000/NE1000
On Mon, 4 Feb 2008 16:29:21 +0100 Pierre Peiffer [EMAIL PROTECTED] wrote: Hi, When I compile the kernel 2.6.24-mm1 with: CONFIG_NET_ISA=y CONFIG_NE2000=y I have the following compile error: ... GEN .version CHK include/linux/compile.h UPD include/linux/compile.h CC init/version.o LD init/built-in.o LD .tmp_vmlinux1 drivers/built-in.o: In function `ne_block_output': linux-2.6.24-mm1/drivers/net/ne.c:797: undefined reference to `NS8390_init' drivers/built-in.o: In function `ne_drv_resume': linux-2.6.24-mm1/drivers/net/ne.c:858: undefined reference to `NS8390_init' drivers/built-in.o: In function `ne_probe1': linux-2.6.24-mm1/drivers/net/ne.c:539: undefined reference to `NS8390_init' make[1]: *** [.tmp_vmlinux1] Error 1 make: *** [sub-make] Error 2 Thanks for reporting this. As I saw that the file 8390p.c is compiled for this driver, but not the file 8390.c which contains this function NS8390_init(), I fixed this error with the following patch. Alan's 8390-split-8390-support-into-a-pausing-and-a-non-pausing-driver-core.patch would be a prime suspect. I assume this bug isn't present ing mainline or in 2.6.24? As NS8390p_init() does the same thing than NS8390_init(), I suppose that this is the right fix ? Signed-off-by: Pierre Peiffer [EMAIL PROTECTED] --- drivers/net/ne.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) Index: b/drivers/net/ne.c === --- a/drivers/net/ne.c +++ b/drivers/net/ne.c @@ -536,7 +536,7 @@ static int __init ne_probe1(struct net_d #ifdef CONFIG_NET_POLL_CONTROLLER dev-poll_controller = eip_poll; #endif - NS8390_init(dev, 0); + NS8390p_init(dev, 0); ret = register_netdev(dev); if (ret) @@ -794,7 +794,7 @@ retry: if (time_after(jiffies, dma_start + 2*HZ/100)) { /* 20ms */ printk(KERN_WARNING %s: timeout waiting for Tx RDC.\n, dev-name); ne_reset_8390(dev); - NS8390_init(dev,1); + NS8390p_init(dev,1); break; } @@ -855,7 +855,7 @@ static int ne_drv_resume(struct platform if (netif_running(dev)) { ne_reset_8390(dev); - NS8390_init(dev, 1); + NS8390p_init(dev, 1); netif_device_attach(dev); } return 0; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/24 for-2.6.25] DM9000 updates for 2.6.25
On Tue, 05 Feb 2008 00:01:59 + Ben Dooks [EMAIL PROTECTED] wrote: Subject: [PATCH 00/24 for-2.6.25] DM9000 updates for 2.6.25 Holy cow. This patch set is a series of updates for the DM9000 driver, to tidy-up some of the source, stop the accesses to the PHY and EEPROM sitting and spinning with locks held, and to add ethtool support. Jeff, the immediate issue is that the driver doesn't compile on mips. I have the below lameo fix for it, but it appears to be wrong. Or at least suboptimal. So if you're unprepared to chew on this lot (and 24 patches two weeks into the merge window is one hell of a chew) then we do need to get that regression fixed, at least. From: Andrew Morton [EMAIL PROTECTED] mips: drivers/net/dm9000.c: In function `dm9000_open': drivers/net/dm9000.c:627: error: `IRQT_RISING' undeclared (first use in this function) drivers/net/dm9000.c:627: error: (Each undeclared identifier is reported only once drivers/net/dm9000.c:627: error: for each function it appears in.) Cc: Daniel Mack [EMAIL PROTECTED] Cc: Russell King [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- drivers/net/dm9000.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff -puN drivers/net/dm9000.c~drivers-net-dm9000c-vague-probably-wrong-build-fix drivers/net/dm9000.c --- a/drivers/net/dm9000.c~drivers-net-dm9000c-vague-probably-wrong-build-fix +++ a/drivers/net/dm9000.c @@ -113,8 +113,10 @@ #define writeswoutsw #define writesloutsl #define DM9000_IRQ_FLAGS (IRQF_SHARED | IRQF_TRIGGER_HIGH) -#else +#elif defined(ARM) #define DM9000_IRQ_FLAGS (IRQF_SHARED | IRQT_RISING) +#else +#define DM9000_IRQ_FLAGS (IRQF_SHARED) #endif /* _ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9888] New: tun device without protocol info header fails under IPv6
On Mon, 4 Feb 2008 13:46:13 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9888 Summary: tun device without protocol info header fails under IPv6 Product: Networking Version: 2.5 KernelVersion: =2.6.23 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: low Priority: P1 Component: IPV6 AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Latest working kernel version: None known -- appears to be a historic bug Earliest failing kernel version: All Distribution: Hardware Environment: Software Environment: Problem Description: Steps to reproduce: Open a tun device as type TUN, set the TUN_NO_PI flag, and try sending an IPv6 packet. The packet appears at the interface under tcpdumps, but propagates no further. This is because the default protocol info used for tun devices where the TUN_NO_PI flag is set assumes IPv4 as can be seen by the initialization at the top of the tun_get_user function in drivers/net/tun.c file given by struct tun_pi pi = { 0, __constant_htons(ETH_P_IP) }; This can easily be fixed by adding a quick check at the top of tun_get_user. Basically the code that used to read if (!(tun-flags TUN_NO_PI)) { if ((len -= sizeof(pi)) count) return -EINVAL; if(memcpy_fromiovec((void *)pi, iv, sizeof(pi))) return -EFAULT; } when changed to read if (!(tun-flags TUN_NO_PI)) { if ((len -= sizeof(pi)) count) return -EINVAL; if(memcpy_fromiovec((void *)pi, iv, sizeof(pi))) return -EFAULT; } else { /* Fixup default pi if IPv6 rather than IPv4 */ if (((tun-flags TUN_TYPE_MASK) == TUN_TUN_DEV) (*(char *)(iv-iov_base) == 0x60)) { pi.proto = __constant_htons(ETH_P_IPV6); } } fixes the problem. How do we get this in as part of the maintained codebase?? Please email a tested patch prepared as described in Documentation/SubmittingPatches Documentation/SubmitChecklist http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt to Maxim Krasnyansky [EMAIL PROTECTED] David S. Miller [EMAIL PROTECTED] Andrew Morton [EMAIL PROTECTED] netdev@vger.kernel.org thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-mm1 - Build failure at net/sched/cls_flow.c:598
On Mon, 04 Feb 2008 23:32:49 +0100 Tilman Schmidt [EMAIL PROTECTED] wrote: My attempt to build this failed with: CC [M] net/sched/cls_flow.o net/sched/cls_flow.c: In function ___flow_dump___: net/sched/cls_flow.c:598: error: ___struct tcf_ematch_tree___ has no member named ___hdr___ Config attached. Thanks. hm. #else /* CONFIG_NET_EMATCH */ struct tcf_ematch_tree { }; methinks Patrick has a CONFIG_NET_EMATCH=n problem? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: include/linux/pcounter.h
On Mon, 04 Feb 2008 16:20:35 -0800 (PST) David Miller [EMAIL PROTECTED] wrote: From: Andrew Morton [EMAIL PROTECTED] Date: Mon, 4 Feb 2008 01:44:02 -0800 Please do not merge pieces of generic kernel infrastructure while keeping it all secret on the netdev list. Ever. It was so damn secret that it sat in your -mm tree for months. I never noticed it and I doubt if anyone else did. How was I (or anyone else) to have known? Don't be rediculious Andrew. There is nothing ridiculous about requiring that new generic kernel infrastructure patches be appropriately submitted and reviewed. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.24-mm1] TCP/IPv6 connect() oopses at twothirdsMD4Transform()
On Tue, 05 Feb 2008 10:28:43 +0900 Tetsuo Handa [EMAIL PROTECTED] wrote: Hello. Kernel config is at http://I-love.SAKURA.ne.jp/tmp/config-2.6.24-mm1 2.6.24 works fine. Thanks for testing and reporting. It really helps. Regards. -- BUG: unable to handle kernel paging request at 25476bec IP: [c0211c28] twothirdsMD4Transform+0x78/0x37c *pde = Oops: [#1] SMP DEBUG_PAGEALLOC last sysfs file: /sys/devices/pci:00/:00:10.0/host0/target0:0:1/0:0:1:0/type Modules linked in: nfsd lockd sunrpc exportfs pcnet32 Pid: 2148, comm: a.out Not tainted (2.6.24-mm1 #1) EIP: 0060:[c0211c28] EFLAGS: 00010286 CPU: 0 EIP is at twothirdsMD4Transform+0x78/0x37c EAX: 00084000 EBX: 0800 ECX: 8000 EDX: db45ddec ESI: EDI: 52806380 EBP: db45dddc ESP: db45ddc8 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process a.out (pid: 2148, ti=db45d000 task=deaf9250 task.ti=db45d000) Stack: 8000 def6ef9c 6380 c0759d60 db45de1c db45de28 c0211fd2 0040 c0759d40 0100 52806380 1f2e00ba fffa249f 5a696b37 8dbe1970 cf7579d0 3b0cc350 a54b10a8 def6e9a0 def6ef8c Call Trace: [c0211fd2] ? secure_tcpv6_sequence_number+0x58/0x7a [c0322247] ? tcp_v6_connect+0x46d/0x4e3 [c02ade07] ? lock_sock_nested+0x56/0x5e [c02ecbae] ? inet_stream_connect+0x1c/0x163 [c02ecc24] ? inet_stream_connect+0x92/0x163 [c02ab14b] ? sys_connect+0x72/0x98 [c013b945] ? lock_release_holdtime+0x4e/0x54 [c0115250] ? do_page_fault+0x1c5/0x3fc [c013e858] ? __lock_release+0x4b/0x51 [c0115250] ? do_page_fault+0x1c5/0x3fc [c02ab9f5] ? sys_socketcall+0x6f/0x15e [c0103ae7] ? restore_nocheck+0x12/0x15 [c0103a86] ? syscall_call+0x7/0xb === Code: 31 c1 03 0c ba 8b 7a 0c 01 ce 8b 4d ec c1 c6 0b 31 d9 21 f1 31 d9 03 0c ba 8b 7a 10 01 c8 8b 4d ec c1 c0 13 31 f1 21 c1 33 4d ec 03 0c ba 8b 7a 14 01 cb 89 c1 c1 c3 03 31 f1 21 d9 31 f1 03 0c EIP: [c0211c28] twothirdsMD4Transform+0x78/0x37c SS:ESP 0068:db45ddc8 ---[ end trace 160518059a282c77 ]--- err, Matt? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: locking api self-test hanging
On Sun, 3 Feb 2008 15:02:46 -0800 Andrew Morton [EMAIL PROTECTED] wrote: With current mainline I'm getting intermittent hangs here: http://userweb.kernel.org/~akpm/p2033590.jpg with this config: http://userweb.kernel.org/~akpm/config-sony.txt on the Vaio. Sometimes it boots (then hits another different hang), sometimes it gets stuck there. The second hang is in kobject_uevent_init(). All that function does is call netlink_kernel_create(). These things make bisecting all the other bugs rather hard. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] request_irq() always returns -EINVAL with a NULL handler.
On Thu, 17 Jan 2008 17:57:58 +1100 Rusty Russell [EMAIL PROTECTED] wrote: I assume that these ancient network drivers were trying to find out if an irq is available. eepro.c expecting +EBUSY was doubly wrong. I'm not sure that can_request_irq() is the right thing, but these drivers are definitely wrong. request_irq should BUG() on bad input, and these would have been found earlier. This breaks non-CONFIG_GENERIC_HARDIRQS architectures. alpha: drivers/net/3c503.c: In function 'el2_open': drivers/net/3c503.c:382: error: implicit declaration of function 'can_request_irq' -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html