Re: Random panic in load_balance() with 3.16-rc

2014-08-04 Thread Steven Rostedt
On Fri, 25 Jul 2014 11:29:06 -0700 Linus Torvalds wrote: > On Fri, Jul 25, 2014 at 7:02 AM, Steven Rostedt wrote: > > > > But wouldn't it be rather trivial to run a static analyzer on the final > > vmlinux to make sure there are no red zones? I mean, you would only need > > to read each function

Re: Random panic in load_balance() with 3.16-rc

2014-07-29 Thread Michel Dänzer
On 27.07.2014 03:02, Steven Chamberlain wrote: > On 25/07/14 02:25, Michel Dänzer wrote: >> Attached is fair.s from Debian gcc 4.8.3-5. Does that look better? I'm >> going to try reproducing the problem with a kernel built by that now. > > It looks like gcc-4.9 Debian package version 4.9.1-2 avail

Re: Random panic in load_balance() with 3.16-rc

2014-07-29 Thread Jakub Jelinek
On Mon, Jul 28, 2014 at 08:09:02PM +0200, Markus Trippelsdorf wrote: > Here's the testcase: > > int a, b, c; > void fn1 () > { > int d; > if (fn2 () && !0) > { > b = ( >{ >int e; >fn3 (); >switch (0) >default: >a

Re: Random panic in load_balance() with 3.16-rc

2014-07-28 Thread Michel Dänzer
On 29.07.2014 01:48, Linus Torvalds wrote: > On Sun, Jul 27, 2014 at 8:47 PM, Michel Dänzer wrote: >> On 27.07.2014 04:56, Linus Torvalds wrote: >>> >>> Also, Michel - can you try this patch if you still have your >>> gcc-4.9.0 install, and send me the resulting fair.s file again? >> >> Attached.

Re: Random panic in load_balance() with 3.16-rc

2014-07-28 Thread Theodore Ts'o
On Mon, Jul 28, 2014 at 10:27:39AM -0700, Alexei Starovoitov wrote: > > It's not pretty, but adding it unconditionally was the right thing to do. > Black listing compiler versions is too fragile. > Look at the flip side: now size of build dir will be much smaller :) White-listing the fixed compil

Re: Random panic in load_balance() with 3.16-rc

2014-07-28 Thread Markus Trippelsdorf
On 2014.07.28 at 11:28 -0700, Linus Torvalds wrote: > On Mon, Jul 28, 2014 at 11:09 AM, Markus Trippelsdorf > wrote: > > > > It shouldn't be too hard to implement a simple check for the bug in the > > next release. Just compile the gcc/testsuite/gcc.target/i386/pr61801.c > > testcase with -fcompar

Re: Random panic in load_balance() with 3.16-rc

2014-07-28 Thread Linus Torvalds
On Mon, Jul 28, 2014 at 11:09 AM, Markus Trippelsdorf wrote: > > It shouldn't be too hard to implement a simple check for the bug in the > next release. Just compile the gcc/testsuite/gcc.target/i386/pr61801.c > testcase with -fcompare-debug. If gcc returns 0 then > -fvar-tracking-assignments coul

Re: Random panic in load_balance() with 3.16-rc

2014-07-28 Thread Markus Trippelsdorf
On 2014.07.28 at 10:27 -0700, Alexei Starovoitov wrote: > On Mon, Jul 28, 2014 at 09:45:45AM -0700, Linus Torvalds wrote: > > On Mon, Jul 28, 2014 at 5:26 AM, Frank Ch. Eigler wrote: > > > > > > Please note that the data produced by "-g -fvar-tracking" is consumed > > > by tools like systemtap, pe

Re: Random panic in load_balance() with 3.16-rc

2014-07-28 Thread Alexei Starovoitov
On Mon, Jul 28, 2014 at 09:45:45AM -0700, Linus Torvalds wrote: > On Mon, Jul 28, 2014 at 5:26 AM, Frank Ch. Eigler wrote: > > > > Please note that the data produced by "-g -fvar-tracking" is consumed > > by tools like systemtap, perf, crash, and makes a significant > > difference to the observabi

Re: Random panic in load_balance() with 3.16-rc

2014-07-28 Thread Linus Torvalds
On Sun, Jul 27, 2014 at 8:47 PM, Michel Dänzer wrote: > On 27.07.2014 04:56, Linus Torvalds wrote: >> >> Also, Michel - can you try this patch if you still have your >> gcc-4.9.0 install, and send me the resulting fair.s file again? > > Attached. The frame setup looks fine to me now (apart from

Re: Random panic in load_balance() with 3.16-rc

2014-07-28 Thread Linus Torvalds
On Mon, Jul 28, 2014 at 5:26 AM, Frank Ch. Eigler wrote: > > Please note that the data produced by "-g -fvar-tracking" is consumed > by tools like systemtap, perf, crash, and makes a significant > difference to the observability of debug AND non-debug kernels. Yeah, and compared to having a buggy

Re: Random panic in load_balance() with 3.16-rc

2014-07-28 Thread Frank Ch. Eigler
Hi - On Mon, Jul 28, 2014 at 09:10:04AM -0400, Theodore Ts'o wrote: > [...] > I thought Markus told us that -fno-var-tracking-assignments makes > absolutely no difference for non-debug kernels? It does affect CONFIG_DEBUG_INFO kernels, and that config option is set for all Red Hat kernels (-debug

Re: Random panic in load_balance() with 3.16-rc

2014-07-28 Thread Theodore Ts'o
On Mon, Jul 28, 2014 at 08:26:59AM -0400, Frank Ch. Eigler wrote: > Please note that the data produced by "-g -fvar-tracking" is consumed > by tools like systemtap, perf, crash, and makes a significant > difference to the observability of debug AND non-debug kernels. (The > presence of compiled-in

Re: Random panic in load_balance() with 3.16-rc

2014-07-28 Thread Frank Ch. Eigler
torvalds wrote: > [...] > Actually, I prefer my patch that did it with cc-option checking, and > does it unconditionally. > > Because if we do it even for non-debug builds - where it ostensibly > shouldn't matter - we then have that GCC_COMPARE_DEBUG thing working > regardless of configuration.

Re: Random panic in load_balance() with 3.16-rc

2014-07-26 Thread Jakub Jelinek
On Sat, Jul 26, 2014 at 10:20:55PM +0200, Markus Trippelsdorf wrote: > On 2014.07.26 at 15:55 -0400, Theodore Ts'o wrote: > > On Sat, Jul 26, 2014 at 09:35:57PM +0200, Markus Trippelsdorf wrote: > > > > > > But fortunately the workaround for the new inode.c bug is the same as > > > for the origina

Re: Random panic in load_balance() with 3.16-rc

2014-07-26 Thread Linus Torvalds
On Sat, Jul 26, 2014 at 1:19 PM, Markus Trippelsdorf wrote: > > Yes. The option only affects -g builds. Ok, good. I'll wait a bit to hopefully get confirmation from Michel's setup, but this does seem to be the solution. > So, the option should only be enabled for debugging builds. Something > li

Re: Random panic in load_balance() with 3.16-rc

2014-07-26 Thread Markus Trippelsdorf
On 2014.07.26 at 15:55 -0400, Theodore Ts'o wrote: > On Sat, Jul 26, 2014 at 09:35:57PM +0200, Markus Trippelsdorf wrote: > > > > But fortunately the workaround for the new inode.c bug is the same as > > for the original bug: -fno-var-tracking-assignments. > > > > It would make sense to enabled

Re: Random panic in load_balance() with 3.16-rc

2014-07-26 Thread Markus Trippelsdorf
On 2014.07.26 at 12:56 -0700, Linus Torvalds wrote: > On Sat, Jul 26, 2014 at 12:35 PM, Markus Trippelsdorf > wrote: > > > > But fortunately the workaround for the new inode.c bug is the same as > > for the original bug: -fno-var-tracking-assignments. > > > > It would make sense to enabled it unco

Re: Random panic in load_balance() with 3.16-rc

2014-07-26 Thread Linus Torvalds
On Sat, Jul 26, 2014 at 12:56 PM, Linus Torvalds wrote: > > Also, Michel - can you try this patch if you still have your > gcc-4.9.0 install, and send me the resulting fair.s file again? Hmm. The good news is that with that patch, the GCC_COMPARE_DEBUG build succeeds. At least for my small local

Re: Random panic in load_balance() with 3.16-rc

2014-07-26 Thread Linus Torvalds
On Sat, Jul 26, 2014 at 12:35 PM, Markus Trippelsdorf wrote: > > But fortunately the workaround for the new inode.c bug is the same as > for the original bug: -fno-var-tracking-assignments. > > It would make sense to enabled it unconditionally for all debug > configurations for now. So how is cod

Re: Random panic in load_balance() with 3.16-rc

2014-07-26 Thread Theodore Ts'o
On Sat, Jul 26, 2014 at 09:35:57PM +0200, Markus Trippelsdorf wrote: > > But fortunately the workaround for the new inode.c bug is the same as > for the original bug: -fno-var-tracking-assignments. > > It would make sense to enabled it unconditionally for all debug > configurations for now. Wha

Re: Random panic in load_balance() with 3.16-rc

2014-07-26 Thread Markus Trippelsdorf
On 2014.07.26 at 11:39 -0700, Linus Torvalds wrote: > On Sat, Jul 26, 2014 at 11:28 AM, Linus Torvalds > wrote: > > > > That's a bit worrisome. I haven't actually checked if the code > > generation differs in significant ways yet.. > > Nope. Just three instructions that got re-ordered from ABC to

Re: Random panic in load_balance() with 3.16-rc

2014-07-26 Thread Steven Chamberlain
Hi Michel, On 25/07/14 02:25, Michel Dänzer wrote: > Attached is fair.s from Debian gcc 4.8.3-5. Does that look better? I'm > going to try reproducing the problem with a kernel built by that now. It looks like gcc-4.9 Debian package version 4.9.1-2 available in sid/jessie may have already fixed t

Re: Random panic in load_balance() with 3.16-rc

2014-07-26 Thread Linus Torvalds
On Sat, Jul 26, 2014 at 11:28 AM, Linus Torvalds wrote: > > That's a bit worrisome. I haven't actually checked if the code > generation differs in significant ways yet.. Nope. Just three instructions that got re-ordered from ABC to CAB in a way that makes no difference. But just the knowledge tha

Re: Random panic in load_balance() with 3.16-rc

2014-07-26 Thread Linus Torvalds
On Fri, Jul 25, 2014 at 11:29 AM, Linus Torvalds wrote: > > I'm sure it's possible, but it sounds potentially complicated. Hmm. The bugzilla entry just taught me a new gcc flag: "-fcompare-debug". That apparently makes gcc compile things twice, once with debugging and once without, and verify tha

Re: Random panic in load_balance() with 3.16-rc

2014-07-25 Thread Jakub Jelinek
On Fri, Jul 25, 2014 at 01:01:11PM -0700, Linus Torvalds wrote: > For example, gcc will not create a small stack frame with "sub > $8,%rsp". No, what gcc does is to use a random "push" instruction. > Fair enough, but that really makes things much harder to see. Here's > an example: That is because

Re: Random panic in load_balance() with 3.16-rc

2014-07-25 Thread Steven Rostedt
On Fri, 25 Jul 2014 13:01:11 -0700 Linus Torvalds wrote: > For example, gcc will not create a small stack frame with "sub > $8,%rsp". No, what gcc does is to use a random "push" instruction. > Fair enough, but that really makes things much harder to see. Here's > an example: > > 81314

Re: Random panic in load_balance() with 3.16-rc

2014-07-25 Thread Linus Torvalds
On Fri, Jul 25, 2014 at 11:29 AM, Linus Torvalds wrote: > > Some simple pattern to make sure that the "sub $frame-size,%rsp" comes > before any accesses to (%rbp) (when frame pointers are enabled) > *might* work, but it might also end up missing things. You're going to have a hard time doing that

Re: Random panic in load_balance() with 3.16-rc

2014-07-25 Thread Steven Rostedt
On Fri, 25 Jul 2014 11:29:06 -0700 Linus Torvalds wrote: > On Fri, Jul 25, 2014 at 7:02 AM, Steven Rostedt wrote: > > > > But wouldn't it be rather trivial to run a static analyzer on the final > > vmlinux to make sure there are no red zones? I mean, you would only need > > to read each function

Re: Random panic in load_balance() with 3.16-rc

2014-07-25 Thread Linus Torvalds
On Fri, Jul 25, 2014 at 7:02 AM, Steven Rostedt wrote: > > But wouldn't it be rather trivial to run a static analyzer on the final > vmlinux to make sure there are no red zones? I mean, you would only need > to read each function and check to make sure that the offset of rbp is > within the change

Re: Random panic in load_balance() with 3.16-rc

2014-07-25 Thread Steven Rostedt
On Thu, Jul 24, 2014 at 08:55:28PM -0700, Alexei Starovoitov wrote: > > -mno-red-zone only affected prologue emition in gcc. This part didn't > change between the releases. So the bug is quite deep. > What seems to be happening is that 2nd pass of instruction scheduler > (after emit prologue and r

Re: Random panic in load_balance() with 3.16-rc

2014-07-25 Thread Markus Trippelsdorf
On 2014.07.25 at 11:21 +0200, Markus Trippelsdorf wrote: > On 2014.07.25 at 18:03 +0900, Michel Dänzer wrote: > > On 25.07.2014 17:15, Linus Torvalds wrote: > > > On Thu, Jul 24, 2014 at 11:48 PM, Jakub Jelinek wrote: > > >> > > >> Can I ask anyone involved in this for preprocessed source and all

Re: Random panic in load_balance() with 3.16-rc

2014-07-25 Thread Markus Trippelsdorf
On 2014.07.25 at 18:03 +0900, Michel Dänzer wrote: > On 25.07.2014 17:15, Linus Torvalds wrote: > > On Thu, Jul 24, 2014 at 11:48 PM, Jakub Jelinek wrote: > >> > >> Can I ask anyone involved in this for preprocessed source and all gcc > >> command > >> line options to reproduce it, best in the fo

Re: Random panic in load_balance() with 3.16-rc

2014-07-25 Thread Michel Dänzer
On 25.07.2014 17:15, Linus Torvalds wrote: > On Thu, Jul 24, 2014 at 11:48 PM, Jakub Jelinek wrote: >> >> Can I ask anyone involved in this for preprocessed source and all gcc command >> line options to reproduce it, best in the form of a >> http://gcc.gnu.org/bugzilla/ >> bugreport? > > I've cr

Re: Random panic in load_balance() with 3.16-rc

2014-07-25 Thread Linus Torvalds
On Thu, Jul 24, 2014 at 11:48 PM, Jakub Jelinek wrote: > > Can I ask anyone involved in this for preprocessed source and all gcc command > line options to reproduce it, best in the form of a > http://gcc.gnu.org/bugzilla/ > bugreport? I've created bug 61904 for this: https://gcc.gnu.org/bugzi

Re: Random panic in load_balance() with 3.16-rc

2014-07-24 Thread Jakub Jelinek
On Thu, Jul 24, 2014 at 11:47:17AM -0700, Linus Torvalds wrote: > Adding Jakub to the cc, because gcc-4.9.0 seems to be terminally broken. ... > Jakub, any ideas? Can I ask anyone involved in this for preprocessed source and all gcc command line options to reproduce it, best in the form of a http:

Re: Random panic in load_balance() with 3.16-rc

2014-07-24 Thread Nick Krause
On Thu, Jul 24, 2014 at 11:55 PM, Alexei Starovoitov wrote: > On Fri, Jul 25, 2014 at 10:25:03AM +0900, Michel Dänzer wrote: >> [ Adding the Debian kernel and gcc teams to Cc ] >> >> > movq$load_balance_mask, -136(%rbp) #, %sfp >> > subq$184, %rsp #, >> > >> > Anyway,

Re: Random panic in load_balance() with 3.16-rc

2014-07-24 Thread Alexei Starovoitov
On Fri, Jul 25, 2014 at 10:25:03AM +0900, Michel Dänzer wrote: > [ Adding the Debian kernel and gcc teams to Cc ] > > > movq$load_balance_mask, -136(%rbp) #, %sfp > > subq$184, %rsp #, > > > > Anyway, this is not a kernel bug. This is your compiler creating > > compl

Re: Random panic in load_balance() with 3.16-rc

2014-07-24 Thread Nick Krause
On Thu, Jul 24, 2014 at 10:33 PM, Linus Torvalds wrote: > On Thu, Jul 24, 2014 at 6:25 PM, Michel Dänzer wrote: >> >> Attached is fair.s from Debian gcc 4.8.3-5. Does that look better? I'm >> going to try reproducing the problem with a kernel built by that now. > > This looks better. For roughly

Re: Random panic in load_balance() with 3.16-rc

2014-07-24 Thread Nick Krause
On Thu, Jul 24, 2014 at 9:25 PM, Michel Dänzer wrote: > [ Adding the Debian kernel and gcc teams to Cc ] > > On 25.07.2014 03:47, Linus Torvalds wrote: >> On Wed, Jul 23, 2014 at 6:43 PM, Michel Dänzer wrote: Michel, mind doing make kernel/sched/fair.s and sendin

Re: Random panic in load_balance() with 3.16-rc

2014-07-24 Thread Linus Torvalds
On Thu, Jul 24, 2014 at 6:25 PM, Michel Dänzer wrote: > > Attached is fair.s from Debian gcc 4.8.3-5. Does that look better? I'm > going to try reproducing the problem with a kernel built by that now. This looks better. For roughly that same code sequence it does (ignoring the debug line and cfi

Re: Random panic in load_balance() with 3.16-rc

2014-07-24 Thread Peter Zijlstra
On Thu, Jul 24, 2014 at 11:47:17AM -0700, Linus Torvalds wrote: > However, that constant spilling part just counts as "too stupid to > live". The real bug is this: > > movq$load_balance_mask, -136(%rbp) #, %sfp > subq$184, %rsp #, > > where gcc creates the stack fram

Re: Random panic in load_balance() with 3.16-rc

2014-07-24 Thread Linus Torvalds
On Wed, Jul 23, 2014 at 6:43 PM, Michel Dänzer wrote: >> >> Michel, mind doing >> >> make kernel/sched/fair.s >> >> and sending us the resulting file? > > Here it is, gzipped, hope that's okay. > > Note that my tree is now based on 3.16-rc6. Ok, so I'm looking at the code generation and your

Re: Random panic in load_balance() with 3.16-rc

2014-07-24 Thread Peter Zijlstra
On Thu, Jul 24, 2014 at 09:51:57AM +0200, Peter Zijlstra wrote: > > I hope the assembly output I sent earlier helps, I'm afraid bisecting > > this could be painful. > > Yeah, lemme go have a look... So I'm not seeing it, the cpus value is kept at -136(%rbp), so -128(%rbp) comes after and that's s

Re: Random panic in load_balance() with 3.16-rc

2014-07-24 Thread Peter Zijlstra
On Thu, Jul 24, 2014 at 04:18:48PM +0900, Michel Dänzer wrote: > On 23.07.2014 18:31, Michel Dänzer wrote: > > On 23.07.2014 18:25, Peter Zijlstra wrote: > >> On Wed, Jul 23, 2014 at 10:28:19AM +0200, Peter Zijlstra wrote: > >> > >>> Of course, the other thing that patch did is clear sgp->power (no

Re: Random panic in load_balance() with 3.16-rc

2014-07-24 Thread Michel Dänzer
On 23.07.2014 18:31, Michel Dänzer wrote: > On 23.07.2014 18:25, Peter Zijlstra wrote: >> On Wed, Jul 23, 2014 at 10:28:19AM +0200, Peter Zijlstra wrote: >> >>> Of course, the other thing that patch did is clear sgp->power (now >>> sgc->capacity). >> >> Hmm, re-reading the thread there isn't a cle

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Linus Torvalds
On Wed, Jul 23, 2014 at 12:02 PM, Peter Zijlstra wrote: > > Here goes.. Oh. So this doesn't have CPUMASK_OFFSTACK set at all, so the pointer has never been loaded from memory in the first place. The calculation has been (for me) something like movq$load_balance_mask, %rax add

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Linus Torvalds
On Wed, Jul 23, 2014 at 11:41 AM, Peter Zijlstra wrote: > > OK, that leaves us agreeing we want to clean that up, but still no > closer to explaining WTH happened on Michel's machine. Weird that. So looking at that destination pointer value (10043c803e8c), it *looks* like a pointer. Almost. I

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Peter Zijlstra
On Wed, Jul 23, 2014 at 11:35:06AM -0700, Linus Torvalds wrote: > But the code does appear to be correct. It just is messy, avoids the > proper abstractions, and generates suboptimal code for the off-stack > case. OK, that leaves us agreeing we want to clean that up, but still no closer to explain

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Linus Torvalds
On Wed, Jul 23, 2014 at 11:25 AM, Peter Zijlstra wrote: > On Wed, Jul 23, 2014 at 10:26:21AM -0700, Linus Torvalds wrote: >> >> The whole - and really *only* - point of __get_cpu_var is to get the >> address of a a cpu variable. If you want to read the *value* of the >> variable, you should use "t

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Peter Zijlstra
On Wed, Jul 23, 2014 at 11:07:40AM -0700, Linus Torvalds wrote: > - *static* per-cpu allocations might want to use "cpumask_var_t" (to > avoid having a full "struct cpumask_t") along with doing a > "zalloc_cpumask_var_node(..)" for each cpu. > > sched_init() follows that last pattern, except it

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Peter Zijlstra
On Wed, Jul 23, 2014 at 10:15:23AM -0700, Linus Torvalds wrote: > On Wed, Jul 23, 2014 at 10:04 AM, Peter Zijlstra wrote: > > On Wed, Jul 23, 2014 at 09:54:23AM -0700, Linus Torvalds wrote: > > > >> So the length is fine, and the disassembly shows that it is fixed (16 > >> 32-bit words - why the h

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Peter Zijlstra
On Wed, Jul 23, 2014 at 10:26:21AM -0700, Linus Torvalds wrote: > On Wed, Jul 23, 2014 at 10:12 AM, Linus Torvalds > wrote: > > > > sched_init() definitely does _not_ allocate a cpumask_var. > > Side note: another good rule of thumb for per-cpu variables is: > > - if you use __get_cpu_var() wit

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Peter Zijlstra
On Wed, Jul 23, 2014 at 10:12:35AM -0700, Linus Torvalds wrote: > On Wed, Jul 23, 2014 at 10:03 AM, Peter Zijlstra wrote: > > On Wed, Jul 23, 2014 at 09:54:23AM -0700, Linus Torvalds wrote: > >> > >> And I wonder if I have a clue. Look, load_balance_mask is a > >> "cpumask_var_t", but I don't see

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Peter Zijlstra
On Wed, Jul 23, 2014 at 09:54:23AM -0700, Linus Torvalds wrote: > Alternatively, keep it a "cpumask_var_t", but then you need to use > __get_cpu_pointer() to get the address of it, and use > "alloc_cpumask_var()" to allocate area for the OFFSTACK case. > > TOTALLY UNTESTED AND PROBABLY PURE CRAP P

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Linus Torvalds
On Wed, Jul 23, 2014 at 10:12 AM, Linus Torvalds wrote: > > A cpumask_var is TOTALLY DIFFERENT. It's *either* a cpumask _or_ just > a pointer to an externally allocated cpumask. > > sched_init() definitely does _not_ allocate a cpumask_var. I take that back. It does end up allocating it properly,

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Linus Torvalds
On Wed, Jul 23, 2014 at 10:12 AM, Linus Torvalds wrote: > > sched_init() definitely does _not_ allocate a cpumask_var. Side note: another good rule of thumb for per-cpu variables is: - if you use __get_cpu_var() without taking the address of it, you're doing something wrong and stupid. The who

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Linus Torvalds
On Wed, Jul 23, 2014 at 10:04 AM, Peter Zijlstra wrote: > On Wed, Jul 23, 2014 at 09:54:23AM -0700, Linus Torvalds wrote: > >> So the length is fine, and the disassembly shows that it is fixed (16 >> 32-bit words - why the heck does it use "movsl" rather than "movsq", >> whatever). > > Which is ex

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Linus Torvalds
On Wed, Jul 23, 2014 at 10:03 AM, Peter Zijlstra wrote: > On Wed, Jul 23, 2014 at 09:54:23AM -0700, Linus Torvalds wrote: >> >> And I wonder if I have a clue. Look, load_balance_mask is a >> "cpumask_var_t", but I don't see a "alloc_cpumask_var()" for it. >> That's broken with CONFIG_CPUMASK_OFFST

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Peter Zijlstra
On Wed, Jul 23, 2014 at 09:54:23AM -0700, Linus Torvalds wrote: > So the length is fine, and the disassembly shows that it is fixed (16 > 32-bit words - why the heck does it use "movsl" rather than "movsq", > whatever). Which is exactly right btw, he's got CONFIG_NR_CPUS=512 and 8*4*16=512. -- T

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Peter Zijlstra
On Wed, Jul 23, 2014 at 09:54:23AM -0700, Linus Torvalds wrote: > On Wed, Jul 23, 2014 at 8:55 AM, Peter Zijlstra wrote: > >> > >> I haven't seen the full oops, can you forward the screenshot? The > >> exact register state might give some clues. > > > > Sure, here goes. > > So the length is fine,

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Linus Torvalds
On Wed, Jul 23, 2014 at 8:55 AM, Peter Zijlstra wrote: >> >> I haven't seen the full oops, can you forward the screenshot? The >> exact register state might give some clues. > > Sure, here goes. So the length is fine, and the disassembly shows that it is fixed (16 32-bit words - why the heck does

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Linus Torvalds
On Wed, Jul 23, 2014 at 7:24 AM, Peter Zijlstra wrote: > > No way either of those should generate a #GP. Puzzled. I haven't seen the full oops, can you forward the screenshot? The exact register state might give some clues. Linus -- To unsubscribe from this list: send the line "unsu

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Michel Dänzer
On 23.07.2014 23:24, Peter Zijlstra wrote: > On Wed, Jul 23, 2014 at 01:30:21PM +0200, Peter Zijlstra wrote: >> On Wed, Jul 23, 2014 at 01:11:10PM +0200, Peter Zijlstra wrote: >>> On Wed, Jul 23, 2014 at 10:45:46AM +0100, Dietmar Eggemann wrote: Doesn't the picture showing the captured panic r

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Peter Zijlstra
On Wed, Jul 23, 2014 at 01:30:21PM +0200, Peter Zijlstra wrote: > On Wed, Jul 23, 2014 at 01:11:10PM +0200, Peter Zijlstra wrote: > > On Wed, Jul 23, 2014 at 10:45:46AM +0100, Dietmar Eggemann wrote: > > > Doesn't the picture showing the captured panic reveal more information. > > > Haven't seen it

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Peter Zijlstra
On Wed, Jul 23, 2014 at 01:11:10PM +0200, Peter Zijlstra wrote: > On Wed, Jul 23, 2014 at 10:45:46AM +0100, Dietmar Eggemann wrote: > > Doesn't the picture showing the captured panic reveal more information. > > Haven't seen it myself, I just saw Peter's reply to your email > > Its a general prote

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Peter Zijlstra
On Wed, Jul 23, 2014 at 10:45:46AM +0100, Dietmar Eggemann wrote: > Doesn't the picture showing the captured panic reveal more information. > Haven't seen it myself, I just saw Peter's reply to your email Its a general protection fault from somewhere in load_balance(), I send you the picture. It

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Peter Zijlstra
On Wed, Jul 23, 2014 at 06:31:26PM +0900, Michel Dänzer wrote: > On 23.07.2014 18:25, Peter Zijlstra wrote: > > On Wed, Jul 23, 2014 at 10:28:19AM +0200, Peter Zijlstra wrote: > > > >> Of course, the other thing that patch did is clear sgp->power (now > >> sgc->capacity). > > > > Hmm, re-reading

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Dietmar Eggemann
On 23/07/14 10:31, Michel Dänzer wrote: > On 23.07.2014 18:25, Peter Zijlstra wrote: >> On Wed, Jul 23, 2014 at 10:28:19AM +0200, Peter Zijlstra wrote: >> >>> Of course, the other thing that patch did is clear sgp->power (now >>> sgc->capacity). >> >> Hmm, re-reading the thread there isn't a clear

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Michel Dänzer
On 23.07.2014 18:25, Peter Zijlstra wrote: > On Wed, Jul 23, 2014 at 10:28:19AM +0200, Peter Zijlstra wrote: > >> Of course, the other thing that patch did is clear sgp->power (now >> sgc->capacity). > > Hmm, re-reading the thread there isn't a clear confirmation its this > patch at all. Could y

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Peter Zijlstra
On Wed, Jul 23, 2014 at 10:28:19AM +0200, Peter Zijlstra wrote: > Of course, the other thing that patch did is clear sgp->power (now > sgc->capacity). Hmm, re-reading the thread there isn't a clear confirmation its this patch at all. Could you perhaps bisect this to either verify it is indeed th

Re: Random panic in load_balance() with 3.16-rc

2014-07-23 Thread Peter Zijlstra
On Wed, Jul 23, 2014 at 05:05:24PM +0900, Michel Dänzer wrote: > On 23.07.2014 15:49, Peter Zijlstra wrote: > Attached. No FAIL messages yet. > [0.467570] __sdt_alloc: allocated 8802155ea4c0 with cpus: > [0.467574] __sdt_alloc: allocated 8802155ea3c0 with cpus: > [0.467576] _

Re: Random panic in load_balance() with 3.16-rc

2014-07-22 Thread Peter Zijlstra
On Tue, Jul 22, 2014 at 09:21:40PM -0700, Linus Torvalds wrote: > On Tue, Jul 22, 2014 at 8:53 PM, Michel Dänzer wrote: > > > > Just happened again with the same change on top of 3.16-rc6. > > The (maybe) related bugzilla entry is just odd. Bruno Wolff reports > that the BUG_ON() in his added pat

Re: Random panic in load_balance() with 3.16-rc

2014-07-22 Thread Linus Torvalds
On Tue, Jul 22, 2014 at 8:53 PM, Michel Dänzer wrote: > > Just happened again with the same change on top of 3.16-rc6. The (maybe) related bugzilla entry is just odd. Bruno Wolff reports that the BUG_ON() in his added patch triggers: + cpumask_clear(sched_group_cpus(sg)); +

Re: Random panic in load_balance() with 3.16-rc

2014-07-22 Thread Michel Dänzer
On 22.07.2014 15:13, Michel Dänzer wrote: > On 18.07.2014 18:29, Michel Dänzer wrote: >> On 17.07.2014 16:58, Peter Zijlstra wrote: >>> On Thu, Jul 17, 2014 at 04:31:04PM +0900, Michel Dänzer wrote: I've been running into the panic captured in the attached picture (hope it's legible)

Re: Random panic in load_balance() with 3.16-rc

2014-07-21 Thread Michel Dänzer
On 18.07.2014 18:29, Michel Dänzer wrote: > On 17.07.2014 16:58, Peter Zijlstra wrote: >> On Thu, Jul 17, 2014 at 04:31:04PM +0900, Michel Dänzer wrote: >>> >>> I've been running into the panic captured in the attached picture (hope >>> it's legible) randomly while running 3.16-rc4 and -rc5. I have

Re: Random panic in load_balance() with 3.16-rc

2014-07-18 Thread Michel Dänzer
On 17.07.2014 16:58, Peter Zijlstra wrote: > On Thu, Jul 17, 2014 at 04:31:04PM +0900, Michel Dänzer wrote: >> >> I've been running into the panic captured in the attached picture (hope >> it's legible) randomly while running 3.16-rc4 and -rc5. I haven't >> noticed any pattern as to when it happens

Re: Random panic in load_balance() with 3.16-rc

2014-07-17 Thread Peter Zijlstra
On Thu, Jul 17, 2014 at 04:31:04PM +0900, Michel Dänzer wrote: > > I've been running into the panic captured in the attached picture (hope > it's legible) randomly while running 3.16-rc4 and -rc5. I haven't > noticed any pattern as to when it happens; at least once it happened > while the box was