On Thu, Sep 06, 2012 at 11:36:06AM -0400, Eric Paris wrote:
> On Thu, Sep 6, 2012 at 11:08 AM, Dave Jones wrote:
> > Following on from the previous patch that fixed an oops, these
> > are all the other similar code patterns in the tree with the same
> > checks adde
On Thu, Sep 06, 2012 at 11:47:49AM -0400, Dave Jones wrote:
> > Not certain because I haven't looked at what happens with the error
> > code, but I think this might not be right. auditd can be explictly
> > told not to audit certain events, in which case it is normal an
to_str+0x156/0x360
Cc: sta...@vger.kernel.org
Signed-off-by: Dave Jones
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index bd92431..4ada3be 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2562,7 +2562,7 @@ int mpol_to_str(char *buffer, int maxlen, struct
mempolicy *pol, int no_c
On Thu, Sep 06, 2012 at 09:32:49AM -0700, Kees Cook wrote:
> > I just realised, the funny thing about this is that the machine running
> > that test
> > had selinux/audit disabled. And yet here we are, screwing around with
> > audit buffers.
>
> The intent was to have this message show up
On Sun, Oct 07, 2012 at 09:30:29AM -0700, Paul E. McKenney wrote:
> > I think Kconfig is mostly what distro would like to use the thing is
> > the Kconfig text needs to be there upfront when its merged, not two
> > months later, since then it too late for a distro to notice.
> >
> > I'd bet
return checking, and also clears the buffer
beforehand.
Reported-by: Ben Hutchings
Cc: sta...@kernel.org
Signed-off-by: Dave Jones
---
unanswered question: why are the buffer sizes here different ? which is correct?
diff -durpN '--exclude-from=/home/davej/.exclude'
src/git-tr
On Mon, Oct 08, 2012 at 11:09:49AM -0400, Dave Jones wrote:
> Last month I sent in 80de7c3138ee9fd86a98696fd2cf7ad89b995d0a to remove
> a user triggerable BUG in mempolicy.
>
> Ben Hutchings pointed out to me that my change introduced a potential leak
> of stack conten
On Mon, Oct 08, 2012 at 01:35:42PM -0700, David Rientjes wrote:
> > unanswered question: why are the buffer sizes here different ? which is
> > correct?
> >
> Given the current set of mempolicy modes and flags, it's 34, but this can
> change if new modes or flags are added with longer name
Just hit this..
WARNING: at lib/debugobjects.c:261 debug_print_object+0x8c/0xb0()
ODEBUG: free active (active state 0) object type: work_struct hint:
flush_to_ldisc+0x0/0x1a0
Modules linked in: fuse ipt_ULOG nfnetlink tun binfmt_misc nfc caif_socket caif
phonet can llc2 pppoe pppox ppp_generic s
On Wed, Oct 10, 2012 at 10:11:44AM +0200, Jiri Slaby wrote:
> On 10/10/2012 06:26 AM, Dave Jones wrote:
> > Just hit this..
>
> That'd be me perhaps. Do you have some serial device connected? Or is it
> a pure terminals + ptys?
There's a usb serial tty conn
On Tue, Oct 02, 2012 at 01:26:31PM +0200, Ralf Hildebrandt wrote:
> WARNING: at arch/x86/kernel/apic/ipi.c:109
> default_send_IPI_mask_logical+0x97/0xc7()
> Hardware name: ProLiant DL360 G4
> empty IPI mask
> Modules linked in: nfnetlink_log nfnetlink ipv6 tg3 microcode rng_core
> psmouse
On Fri, Oct 05, 2012 at 06:06:12PM -0400, Aristeu Rozanski wrote:
> Hi Dave,
> On Fri, Oct 05, 2012 at 05:59:29PM -0400, Dave Jones wrote:
> > On boot in Linus' current tree..
> >
> > ===
> > [ INFO: suspicious RCU u
On Tue, Oct 16, 2012 at 10:24:32PM -0700, David Rientjes wrote:
> On Wed, 17 Oct 2012, Dave Jones wrote:
>
> > BUG: sleeping function called from invalid context at kernel/mutex.c:269
>
> Hmm, looks like we need to change the refcount semantics entirely. We
On Wed, Oct 17, 2012 at 12:21:10PM -0700, David Rientjes wrote:
> On Wed, 17 Oct 2012, Dave Jones wrote:
>
> > On Tue, Oct 16, 2012 at 10:24:32PM -0700, David Rientjes wrote:
> > > On Wed, 17 Oct 2012, Dave Jones wrote:
> > >
> > > > BUG: sleepi
ces as a result of this change causing
> the mempolicies to never be freed. ("numa_policy" turns out to be
> policy_cache in the code, so thanks for checking both of them.)
>
> Could I add your tested-by?
Sure. Here's a fresh one I just baked.
Tested-by: Dave Jones
Triggered while fuzz testing..
BUG: MAX_LOCKDEP_ENTRIES too low!
turning off the locking correctness validator.
Pid: 22788, comm: kworker/2:1 Not tainted 3.7.0-rc1+ #34
Call Trace:
[] add_lock_to_list.isra.29.constprop.45+0xdd/0xf0
[] __lock_acquire+0x1121/0x1ba0
[] lock_acquire+0xa2/0x220
[]
I was curious why sys_kcmp wasn't working, which led me to the testcase.
It turned out I hadn't enabled CHECKPOINT_RESTORE in the kernel I was testing.
Add a decoding of errno to the testcase to make that obvious.
Signed-off-by: Dave Jones
diff --git a/tools/testing/selftests/kcmp/k
On Thu, Oct 18, 2012 at 07:53:08AM +0200, Jens Axboe wrote:
> On 2012-10-18 03:53, Dave Jones wrote:
> > Triggered while fuzz testing..
> >
> >
> > BUG: MAX_LOCKDEP_ENTRIES too low!
> > turning off the locking correctness validator.
> > Pid: 22788,
I've hit this twice in the last two days while fuzz testing.
(Both times on i686 only, my x86-64 tests aren't hitting it
for some reason).
BUG: unable to handle kernel paging request at 6b6b6ce3
IP: [] module_put+0x1e/0x160
*pdpt = 25a4b001 *pde =
Oops: [#1] PREEMPT
On Fri, Oct 19, 2012 at 10:43:51AM -0400, Dave Jones wrote:
> I've hit this twice in the last two days while fuzz testing.
> (Both times on i686 only, my x86-64 tests aren't hitting it
> for some reason).
>
> BUG: unable to handle kernel paging request at 6b6b6ce3
On Fri, Oct 19, 2012 at 02:49:32PM +0200, Peter Zijlstra wrote:
> Of course, if you do run out of lock classes, the next thing to do is
> to find the offending lock classes. First, the following command gives
> you the number of lock classes currently in use along with the maximum:
>
>
[ 66.104790] WARNING: at drivers/gpu/drm/radeon/rs600.c:571
rs600_irq_set+0x1e2/0x200()
[ 66.105748] Hardware name: GA-MA78GM-S2H
[ 66.106220] Can't enable IRQ/MSI because no handler is installed
[ 66.106935] Modules linked in:
[ 66.107329] Pid: 43, comm: kworker/0:1 Not tainted 3.7.0-rc
On Tue, Dec 11, 2012 at 04:23:27PM +0800, Fengguang Wu wrote:
> On Wed, Nov 28, 2012 at 09:55:15AM -0500, Dave Jones wrote:
> > We had a user report the soft lockup detector kicked after 22
> > seconds of no progress, with this trace..
>
> Where is the original report?
(Taint comes from previous r600 bug reported here
https://lkml.org/lkml/2012/12/8/131)
[35662.070628] BUG: unable to handle kernel NULL pointer dereference at
(null)
[35662.071719] IP: [] r100_debugfs_cp_ring_info+0x115/0x140
[35662.072652] PGD b4c17067 PUD b69d1067 PMD 0
[35662.07324
This says rc8+, but it's just missing the Makefile change, so it's still there
in 3.7
Curious that firefox was the process mentioned here, as ~/.mozilla isn't on xfs.
My only xfs partition is /data holding a kernel source tree & .ccache
Dave
[30557.769727] ==
When we see reports like https://bugzilla.redhat.com/show_bug.cgi?id=883576
it might be useful to know what modules had been loaded, so they can be compared
with similar reports to see if there is a common suspect.
Signed-off-by: Dave Jones
diff --git a/mm/memory.c b/mm/memory.c
index 221fc9f
Looks like we're doing a double-init on a timer.
I had been experimenting with powertop, so that may have triggered something
maybe suspend/resume related ?
(says -rc8, but it's only missing the Makefile change)
[14844.560489] WARNING: at lib/debugobjects.c:261 debug_print_object+0x8c/0xb0()
[14
Fuzz-testing fallout from post 3.7 tree as of commit
414a6750e59b0b687034764c464e9ddecac0f7a6
[ 2181.230579] [ cut here ]
[ 2181.231277] WARNING: at drivers/tty/tty_buffer.c:476
flush_to_ldisc+0x1de/0x1f0()
[ 2181.232358] Hardware name: GA-MA78GM-S2H
[ 2181.232925] tty is
On Tue, Nov 13, 2012 at 07:50:25PM -0800, Hugh Dickins wrote:
> Originally I was waiting to hear further from Dave; but his test
> machine was giving trouble, and it occurred to me that, never mind
> whether he says he has hit it again, or he has not hit it again,
> the answer is the same: do
On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote:
> -/* We already confirmed swap, and make no allocation */
> -VM_BUG_ON(error);
> +/*
> + * We already confirmed swap under page lock, and make
> +
On Tue, Nov 06, 2012 at 04:11:00PM +, Alan Cox wrote:
> > But a deadlock we have lived with for years. Without reverting,
> > we're prevented from discovering all the new deadlocks we're adding.
>
> We lived with it locking boxes up on users but not knowing why.
Circa 3.5 we got a lot m
While fuzz-testing, I frequently run into this..
trinity-child4: page allocation failure: order:4, mode:0x40d0
Pid: 21842, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54
Call Trace:
[] warn_alloc_failed+0xe9/0x150
[] ? __alloc_pages_direct_compact+0x1f8/0x209
[] __alloc_pages_nodemask+0x936/0x
On Tue, Nov 06, 2012 at 03:02:21PM -0600, Nathan Zimmer wrote:
> On systems with 4096 cores attemping to read /proc/sched_debug fails.
> We are trying to push all the data into a single kmalloc buffer.
> The issue is on these very large machines all the data will not fit in 4mb.
>
> A better
On Tue, Nov 06, 2012 at 05:24:15PM -0600, Nathan Zimmer wrote:
> On Tue, Nov 06, 2012 at 04:31:28PM -0500, Dave Jones wrote:
> > On Tue, Nov 06, 2012 at 03:02:21PM -0600, Nathan Zimmer wrote:
> > > On systems with 4096 cores attemping to read /proc/sched_debug fails.
>
On Tue, Nov 06, 2012 at 04:15:35PM -0800, Julius Werner wrote:
> tcp_recvmsg contains a sanity check that WARNs when there is a gap
> between the socket's copied_seq and the first buffer in the
> sk_receive_queue. In theory, the TCP stack makes sure that This Should
> Never Happen (TM)... howev
On Tue, Nov 06, 2012 at 05:51:19PM -0800, Julius Werner wrote:
> > We've had reports of this WARN against the Fedora kernel for a while.
> > Had this been immediately followed by a BUG(), we'd have never seen those
> > traces at all,
> > and just got "my machine just locked up" reports instead
On Wed, Nov 07, 2012 at 08:29:12AM -0800, Eric Dumazet wrote:
> On Wed, 2012-11-07 at 10:54 -0500, Dave Jones wrote:
>
> > It sounds more appropriate to me, instead of silently wedging the box.
> > At least with that approach we have a chance of finding out what happened.
&
On Wed, Nov 07, 2012 at 09:05:02AM -0800, Eric Dumazet wrote:
> On Wed, 2012-11-07 at 11:43 -0500, Dave Jones wrote:
>
> > dude, look at the bug reports I just pointed you at.
> > People _are_ aware there are bugs there.
> >
> If I remember well, I helped to fix
On Tue, Nov 06, 2012 at 03:48:20PM -0800, Hugh Dickins wrote:
> > [ cut here ]
> > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
> > Hardware name: 2012 Client Platform
> > Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54
>
> That's the very
This happened to a box I left running fuzz tests over the holidays.
schedule_timeout: wrong timeout value fff0
Pid: 6606, comm: trinity-child1 Not tainted 3.8.0-rc1+ #43
Call Trace:
[] schedule_timeout+0x305/0x340
[] ? preempt_schedule+0x42/0x60
[] ? _raw_spin_unlock_irqrestore+0x7
This happened a few times to my test boxes I left running over the holidays..
[ 8419.797533] [ cut here ]
[ 8419.798341] WARNING: at drivers/tty/tty_buffer.c:476
flush_to_ldisc+0x1de/0x1f0()
[ 8419.800313] Hardware name: GA-MA78GM-S2H
[ 8419.800887] tty is NULL
[ 8419.8018
We've had a increased number of reports in the last six months or so
from Fedora users getting corrupted page tables.
At first I wrote it off to bad hardware, but they started happening frequently
enough that I began to wonder if it was a real problem.
The only common thing I could think of was th
Along the same lines as 779302e67835fe9a6b74327e54969ba59cb3478a, xattrs
can cause big allocations, which are likely to fail under memory pressure..
[20539.081122] trinity-child3: page allocation failure: order:4, mode:0x1040d0
[20539.090405] Pid: 27617, comm: trinity-child3 Not tainted 3.8.0-rc1+
On Wed, Jan 02, 2013 at 11:01:15AM -0500, Chris Mason wrote:
> > [52460.280346] BUG: Bad page map in process panel-6-systray
> > pte:8800b665a0e8 pmd:b6659067
> > [52460.280848] addr:0038bf3fd000 vm_flags:0070 anon_vma:
> > (null) mapping:88011052fd98 index:1fd
> >
I have no idea what happened here, but this is the first time I've seen this
one.
This was running a tree pulled yesterday afternoon.
BUG: unable to handle kernel paging request at 880100201000
IP: [] copy_page_rep+0x5/0x10
PGD 1c0c063 PUD cfbff067 PMD cfc01067 PTE 800100201160
Oops:
(At least I think that's where 'cpu_list_show' comes from...
those preprocessor tricks confuse ctags)
Just started seeing this today..
(fwiw, cpu is a Phenom(tm) 9750)
Dave
WARNING: at kernel/mutex.c:198 mutex_lock_nested+0x39c/0x3b0()
Hardware name: GA-MA78GM-S2H
Modules linked in: hid
>From Linus' tree as of a half hour ago.
echo 0 > /sys/devices/system/cpu/cpu1/online
[ 67.675171] ==
[ 67.676121] [ INFO: possible circular locking dependency detected ]
[ 67.677084] 3.7.0+ #34 Not tainted
[ 67.677641] ---
Did a mount from a client (also running Linus current), and the
server spat this out..
[ 6936.306135] [ cut here ]
[ 6936.306154] WARNING: at net/sunrpc/clnt.c:617
rpc_shutdown_client+0x12a/0x1b0 [sunrpc]()
[ 6936.306156] Hardware name:
[ 6936.306157] Modules link
On Wed, Oct 24, 2012 at 09:44:07PM -0700, Darren Hart wrote:
> > I've been able to trigger this for the last week or so.
> > Unclear whether this is a new bug, or my fuzzer got smarter, but I see the
> > pi-futex code hasn't changed since the last time it found something..
> >
> > > BUG: u
Slightly old kernel, but likely still relevant judging by recent commits.
Hit an oom condition while fuzz testing, but what's interesting is what
happened immediately afterwards.. (the lockup)
Could this just be that the kernel was so busy recovering from swapping etc
that watchdog perceived that
On Thu, Jun 06, 2013 at 05:43:13PM +0200, Frederic Weisbecker wrote:
> > Every process 200% or 0%.
>
> I see, would you mind testing this branch?
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
> timers/urgent
>
> It might help, I specially think about
> 45eacc69277
3.10 seems to have a problem with dirty RAID5 sets.
I've got a machine that panics on boot during RAID5 activation.
After switching the BUG_ON to a WARN_ON, I was able to get this over serial
console..
md/raid:md0: not clean -- starting background reconstruction
md/raid:md0: device sdd1 operatio
On Wed, Jun 12, 2013 at 03:43:46PM -0400, Dave Jones wrote:
> 3.10 seems to have a problem with dirty RAID5 sets.
>
> I've got a machine that panics on boot during RAID5 activation.
> After switching the BUG_ON to a WARN_ON, I was able to get this over serial
> console.
Hit this while running this script in a loop..
https://github.com/kernelslacker/io-tests/blob/master/setup.sh
[34385.251507] [ cut here ]
[34385.254068] WARNING: at fs/btrfs/inode.c:7961
btrfs_destroy_inode+0x265/0x2e0 [btrfs]()
[34385.257275] Modules linked in: vmw_vsock_
On Thu, Jun 06, 2013 at 09:07:52AM +0200, Lukasz Majewski wrote:
> +config CPU_FREQ_BOOST
> +bool "CPU frequency boost support"
> +help
> + Switch to enable support for frequency boost
> +
> + If in doubt, say N.
> +
This help text is devoid of any useful information.
On
On Wed, Jun 05, 2013 at 10:54:26AM -0700, Greg KH wrote:
> Here are 3 patches that I've tested out on my system with only a small
> number of devices, but it seems to work, so why not let others try it
> out...
>
> These patches make the USB to serial core have the ability to support up
> to
On Thu, Jun 06, 2013 at 05:14:31PM +0200, Lukasz Majewski wrote:
> Hi Dave,
>
> > On Thu, Jun 06, 2013 at 09:07:52AM +0200, Lukasz Majewski wrote:
> >
> > > +config CPU_FREQ_BOOST
> > > + bool "CPU frequency boost support"
> > > + help
> > > + Switch to enable supp
On Tue, May 14, 2013 at 03:21:07AM +0200, Frederic Weisbecker wrote:
> On Thu, May 09, 2013 at 05:10:26PM -0400, Dave Jones wrote:
> > On Thu, May 09, 2013 at 11:02:08PM +0200, Frederic Weisbecker wrote:
> >
> > > > RCU options for this build are..
> > &
On Thu, Jun 06, 2013 at 02:59:44PM -0500, Jesse Larrew wrote:
> >pr_debug("%s: xmit %p %pM\n", vi->dev->name, skb, dest);
> > + if (vi->mergeable_rx_bufs)
> > + hdr_len = sizeof hdr->mhdr;
> > + else
> > + hdr_len = sizeof hdr->hdr;
>
> All conditionals need braces
On Thu, Jun 06, 2013 at 05:43:13PM +0200, Frederic Weisbecker wrote:
> > Every process 200% or 0%.
>
> I see, would you mind testing this branch?
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
> timers/urgent
>
> It might help, I specially think about
> 45eacc6927
On Mon, Jun 17, 2013 at 01:39:42PM -0400, Chris Mason wrote:
> Quoting Dave Jones (2013-06-17 09:49:55)
> > Hit this while running this script in a loop..
> > https://github.com/kernelslacker/io-tests/blob/master/setup.sh
> > [34385.251507] -
On Mon, Jun 17, 2013 at 02:42:27PM -0400, Chris Mason wrote:
> Quoting Dave Jones (2013-06-17 14:20:06)
> > On Mon, Jun 17, 2013 at 01:39:42PM -0400, Chris Mason wrote:
> > > Quoting Dave Jones (2013-06-17 09:49:55)
> > > > Hit this while running this scri
Something else I've seen a few times from my io script
(Always during btrfs runs)...
BUG: MAX_LOCKDEP_CHAINS too low!
turning off the locking correctness validator.
Please attach the output of /proc/lock_stat to the bug report
CPU: 1 PID: 492255 Comm: kworker/u8:0 Not tainted 3.10.0-rc6+ #6
Hardwa
I see this during boot-up:
i5k_amb: probe of i5k_amb.0 failed with error -16
EDAC MC: Ver: 3.0.0
BUG: key ee78135c not in .data!
[ cut here ]
WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x34f/0x37c()
DEBUG_LOCKS_WARN_ON(1)
Modules linked in:
i5000_edac(+) edac_core
Reading /proc/dri/0/vma causes bad things to happen on a box with nouveau
loaded.
(Note, no X running on that box)
Trace below shows trinity, but I can reproduce it with just cat /proc/dri/0/vma
[ cut here ]
kernel BUG at arch/x86/mm/physaddr.c:79!
invalid opcode: [#
On Mon, Jun 17, 2013 at 09:49:27PM -0400, David Airlie wrote:
>
> > Reading /proc/dri/0/vma causes bad things to happen on a box with nouveau
> > loaded.
> > (Note, no X running on that box)
> >
> > Trace below shows trinity, but I can reproduce it with just cat
> > /proc/dri/0/vma
>
>
Also remove another commented out open-coded list manipulation while we're
there.
Signed-off-by: Dave Jones
---
drivers/staging/rtl8192u/ieee80211/ieee80211_rx.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/staging/rtl8192u/ieee80211/ieee80211_rx.c
b/dr
Signed-off-by: Dave Jones
---
net/sctp/protocol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
netdev: patches 1-4 & 6 are independant to this, and 7 won't be
merged until this one gets to Linus' tree.
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index ea
Signed-off-by: Dave Jones
---
drivers/net/wireless/ipw2x00/ipw2200.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ipw2x00/ipw2200.c
b/drivers/net/wireless/ipw2x00/ipw2200.c
index d96257b..4ed5e45 100644
--- a/drivers/net/wireless/ipw2x00/ipw2200.c
Signed-off-by: Dave Jones
---
drivers/gpu/drm/radeon/mkregtable.c | 13 -
1 file changed, 13 deletions(-)
diff --git a/drivers/gpu/drm/radeon/mkregtable.c
b/drivers/gpu/drm/radeon/mkregtable.c
index 5a82b6b..af85299 100644
--- a/drivers/gpu/drm/radeon/mkregtable.c
+++ b/drivers/gpu
Also remove commented out list manipulation
Signed-off-by: Dave Jones
---
drivers/staging/rtl8187se/ieee80211/ieee80211_rx.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/staging/rtl8187se/ieee80211/ieee80211_rx.c
b/drivers/staging/rtl8187se/ieee80211
__list_for_each used to be the non prefetch() aware list walking primitive.
When we removed the prefetch macros from the list routines, it became
redundant. Given it does exactly the same thing as list_for_each now,
we might as well remove it and call list_for_each directly.
Signed-off-by: Dave
Signed-off-by: Dave Jones
---
sound/usb/misc/ua101.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/usb/misc/ua101.c b/sound/usb/misc/ua101.c
index 6ad617b..8b5d2c5 100644
--- a/sound/usb/misc/ua101.c
+++ b/sound/usb/misc/ua101.c
@@ -1349,7 +1349,7 @@ static void
The DR registers are rarely useful when decoding oopses.
With screen real estate during oopses at a premium, we can save two lines
by only printing out these registers when they are set to something other
than they power-on state.
Signed-off-by: Dave Jones
diff -durpN '--exclude-from=
On Tue, Jun 18, 2013 at 10:43:56AM +0200, Borislav Petkov wrote:
> On Tue, Jun 18, 2013 at 12:11:32AM -0400, Dave Jones wrote:
> > The DR registers are rarely useful when decoding oopses.
> > With screen real estate during oopses at a premium, we can save two lines
> >
On Tue, Jun 18, 2013 at 11:58:21AM +0200, Peter Zijlstra wrote:
> > Peter, Are you going to take the preempt_schedule_context() patch?
>
> I have it queued, I just seem to have some problems locating Ingo to
> stuff patches into -tip :/
>
> Will continue prodding.. Ingo if you're reading!
-off-by: Dave Jones
diff -durpN '--exclude-from=/home/davej/.exclude'
/home/davej/src/kernel/git-trees/linux/arch/x86/kernel/process_64.c
linux-dj/arch/x86/kernel/process_64.c
--- linux/arch/x86/kernel/process_64.c 2013-05-01 10:02:52.064151923 -0400
+++ linux-dj/arch/x86/kernel/pr
On Wed, Jun 19, 2013 at 08:24:41PM +0530, Viresh Kumar wrote:
> On 19 June 2013 17:52, Simon Horman wrote:
> > I have no objections to this change but at the same time I don't
> > feel that I know the code well enough to review it.
>
> Probably I made a mistake adding your name. Don't know h
I've been hitting this a lot the last few days.
This is the same machine that I was also seeing lockups during sync()
Dave
BUG: soft lockup - CPU#1 stuck for 22s! [trinity-child9:6902]
Modules linked in: bridge snd_seq_dummy dlci bnep fuse 8021q garp stp hidp tun
rfcomm can_raw ipt_ULOG
On Wed, Jun 19, 2013 at 12:45:40PM -0400, Dave Jones wrote:
> I've been hitting this a lot the last few days.
> This is the same machine that I was also seeing lockups during sync()
On a whim, I reverted 971394f389992f8462c4e5ae0e3b49a10a9534a3
(As I started seeing these just aft
On Wed, Jun 19, 2013 at 02:02:33PM -0400, Chris Mason wrote:
> Quoting Dave Jones (2013-06-17 14:58:10)
> > On Mon, Jun 17, 2013 at 02:42:27PM -0400, Chris Mason wrote:
> > > Quoting Dave Jones (2013-06-17 14:20:06)
> > > > On Mon, Jun 17, 2013 at 01:39
On Wed, Jun 19, 2013 at 11:13:02AM -0700, Paul E. McKenney wrote:
> > On a whim, I reverted 971394f389992f8462c4e5ae0e3b49a10a9534a3
> > (As I started seeing these just after that rcu merge).
> >
> > It's only been 30 minutes, but it seems stable again. Normally I would
> > hit these within
On Wed, Jun 19, 2013 at 11:13:02AM -0700, Paul E. McKenney wrote:
> On Wed, Jun 19, 2013 at 01:53:56PM -0400, Dave Jones wrote:
> > On Wed, Jun 19, 2013 at 12:45:40PM -0400, Dave Jones wrote:
> > > I've been hitting this a lot the last few days.
> > > This is
On Thu, Jun 20, 2013 at 09:16:52AM -0700, Paul E. McKenney wrote:
> On Wed, Jun 19, 2013 at 08:12:12PM -0400, Dave Jones wrote:
> > On Wed, Jun 19, 2013 at 11:13:02AM -0700, Paul E. McKenney wrote:
> > > On Wed, Jun 19, 2013 at 01:53:56PM -0400, Dave Jones wrote:
> >
On Wed, Jun 12, 2013 at 11:34:07AM -0400, Dave Jones wrote:
> On Thu, Jun 06, 2013 at 05:43:13PM +0200, Frederic Weisbecker wrote:
>
> > > Every process 200% or 0%.
> >
> > I see, would you mind testing this branch?
> >
> > git://git.kernel.o
On Mon, Jun 03, 2013 at 07:49:59PM -0700, Greg Kroah-Hartman wrote:
> On Mon, May 27, 2013 at 02:28:51PM +0200, Bjørn Mork wrote:
> > But, IMHO, a nicer approach would be to make the allocation completely
> > dynamic, using e.g. the idr subsystem. Static tables are always feel
> > like straight
On Thu, Jun 20, 2013 at 09:16:52AM -0700, Paul E. McKenney wrote:
> > > > > I've been hitting this a lot the last few days.
> > > > > This is the same machine that I was also seeing lockups during
> > sync()
> > > >
> > > > On a whim, I reverted 971394f389992f8462c4e5ae0e3b49a10a9534a3
On Fri, Jun 21, 2013 at 09:59:49PM +0200, Oleg Nesterov wrote:
> I am puzzled. And I do not really understand
>
> hardirqs last enabled at (2380318): []
> restore_args+0x0/0x30
> hardirqs last disabled at (2380319): []
> apic_timer_interrupt+0x6a/0x80
> softirqs last ena
On Sat, Jun 22, 2013 at 07:31:29PM +0200, Oleg Nesterov wrote:
> > [ 7485.261299] WARNING: at include/linux/nsproxy.h:63
> > get_proc_task_net+0x1c8/0x1d0()
> > [ 7485.262021] Modules linked in: 8021q garp stp tun fuse rfcomm bnep hidp
> > snd_seq_dummy nfnetlink scsi_transport_iscsi can_bc
On Sun, Jun 23, 2013 at 04:36:34PM +0200, Oleg Nesterov wrote:
> > > Dave, I am sorry but all I can do is to ask you to do more testing.
> > > Could you please reproduce the lockup again on the clean Linus's
> > > current ? (and _without_ reverting 8aac6270, of course).
> >
> > I'll give
On Sun, Jun 23, 2013 at 06:04:52PM +0200, Oleg Nesterov wrote:
> > [11018.927809] [sched_delayed] sched: RT throttling activated
> > [11054.897670] BUG: soft lockup - CPU#2 stuck for 22s!
> > [trinity-child2:14482]
> > [11054.898503] Modules linked in: bridge stp snd_seq_dummy tun fuse hidp
On Sun, Jun 23, 2013 at 06:04:52PM +0200, Oleg Nesterov wrote:
> Could you please do the following:
>
> 1. # cd /sys/kernel/debug/tracing
> # echo 0 >> options/function-trace
> # echo preemptirqsoff >> current_tracer
dammit.
WARNING: at include/linux/list.h:385 rb_hea
On Sun, Jun 23, 2013 at 06:04:52PM +0200, Oleg Nesterov wrote:
> > [11054.897670] BUG: soft lockup - CPU#2 stuck for 22s!
> > [trinity-child2:14482]
> > [11054.898503] Modules linked in: bridge stp snd_seq_dummy tun fuse hidp
> > bnep rfcomm can_raw ipt_ULOG can_bcm nfnetlink af_rxrpc llc2 ro
On Mon, Jun 24, 2013 at 10:52:29AM -0400, Steven Rostedt wrote:
> > > check_list_nodes corruption. next->prev should be prev
> > > (88023b8a1a08), but was 0088023b8a1a. (next=880243288001).
> >
> > Can't find "check_list_nodes" in lib/list_debug.c or elsewhere...
> >
> > >
On Mon, Jun 24, 2013 at 06:37:08PM +0200, Oleg Nesterov wrote:
> On 06/24, Dave Jones wrote:
> >
> > On Mon, Jun 24, 2013 at 10:52:29AM -0400, Steven Rostedt wrote:
> >
> > > > > check_list_nodes corruption. next->prev should be prev
> > (8
On Mon, Jun 24, 2013 at 12:24:39PM -0400, Steven Rostedt wrote:
> > Ah, this is the first victim of my new 'check sanity of nodes during list
> > walks' patch.
> > It's doing the same prev->next next->prev checking as list_add and friends.
> > I'm looking at getting it into shape for a 3.12 m
On Mon, Jun 24, 2013 at 07:35:10PM +0200, Oleg Nesterov wrote:
> > Not sure this is helpful, but..
>
> This makes me think that something is seriously broken.
>
> Or I do not understand this stuff at all. Quite possible too.
> Steven, could you please help?
>
> But this is already call
On Mon, Jun 24, 2013 at 01:53:11PM -0400, Steven Rostedt wrote:
> > Also. watchdog_timer_fn() calls printk() only if it detects the
> > lockup, so I assume you hit another one?
>
> Probably.
Yeah, unfortunately it happened while I was travelling home to the box,
so I couldn't stop it after t
On Mon, Jun 24, 2013 at 02:26:00PM -0600, Khalid Aziz wrote:
> @@ -821,7 +821,7 @@ struct blogic_ccb {
> unsigned char cdblen; /* Byte 2 */
> unsigned char sense_datalen;/* Byte 3 */
> u32 datalen;
Took a lot longer to trigger this time. (13 hours of runtime).
This trace may still not be from the first lockup, as a flood of
them happened at the same time.
# tracer: preemptirqsoff
#
# preemptirqsoff latency trace v1.1.5 on 3.10.0-rc7+
# -
201 - 300 of 2273 matches
Mail list logo