On Mon, Dec 02, 2013 at 06:58:57PM -0800, Linus Torvalds wrote:
> In other words, it's unsafe to protect reference counts inside objects
> with anything but spinlocks and/or atomic refcounts. Or you have to
> have the lock *outside* the object you're protecting (which is often
> what you want for
Once more on this issue, because I've been unable to stop thinking
about it, and noticed that it's actually even subtler than I thought
initially.
On Mon, Dec 2, 2013 at 8:00 AM, Linus Torvalds
wrote:
>
> All our reference counting etc seems right, but we have one very
> subtle bug: on the
On Mon, Dec 02, 2013 at 08:00:38AM -0800, Linus Torvalds wrote:
> On Sat, Nov 30, 2013 at 1:08 PM, Linus Torvalds
> wrote:
> >
> > I still don't see what could be wrong with the pipe_inode_info thing,
> > but the fact that it's been so consistent in your traces does make me
> > suspect it really
* Al Viro wrote:
> On Mon, Dec 02, 2013 at 05:27:55PM +0100, Ingo Molnar wrote:
>
> > It's not like there should be many (any?) VFS operations where a pipe
> > is used via i_mutex and pipe->mutex in parallel, which would improve
> > scalability - so I don't see the scalability advantage.
On Mon, Dec 02, 2013 at 05:27:55PM +0100, Ingo Molnar wrote:
> It's not like there should be many (any?) VFS operations where a pipe
> is used via i_mutex and pipe->mutex in parallel, which would improve
> scalability - so I don't see the scalability advantage. (But I might
> be missing
* Linus Torvalds wrote:
> On Sat, Nov 30, 2013 at 1:08 PM, Linus Torvalds
> wrote:
> >
> > I still don't see what could be wrong with the pipe_inode_info thing,
> > but the fact that it's been so consistent in your traces does make me
> > suspect it really is *that* particular slab.
>
> I
On Sat, Nov 30, 2013 at 1:08 PM, Linus Torvalds
wrote:
>
> I still don't see what could be wrong with the pipe_inode_info thing,
> but the fact that it's been so consistent in your traces does make me
> suspect it really is *that* particular slab.
I think I finally found it.
I've spent waaayy
On Sat, Nov 30, 2013 at 1:08 PM, Linus Torvalds
torva...@linux-foundation.org wrote:
I still don't see what could be wrong with the pipe_inode_info thing,
but the fact that it's been so consistent in your traces does make me
suspect it really is *that* particular slab.
I think I finally found
* Linus Torvalds torva...@linux-foundation.org wrote:
On Sat, Nov 30, 2013 at 1:08 PM, Linus Torvalds
torva...@linux-foundation.org wrote:
I still don't see what could be wrong with the pipe_inode_info thing,
but the fact that it's been so consistent in your traces does make me
suspect
On Mon, Dec 02, 2013 at 05:27:55PM +0100, Ingo Molnar wrote:
It's not like there should be many (any?) VFS operations where a pipe
is used via i_mutex and pipe-mutex in parallel, which would improve
scalability - so I don't see the scalability advantage. (But I might
be missing something)
* Al Viro v...@zeniv.linux.org.uk wrote:
On Mon, Dec 02, 2013 at 05:27:55PM +0100, Ingo Molnar wrote:
It's not like there should be many (any?) VFS operations where a pipe
is used via i_mutex and pipe-mutex in parallel, which would improve
scalability - so I don't see the scalability
On Mon, Dec 02, 2013 at 08:00:38AM -0800, Linus Torvalds wrote:
On Sat, Nov 30, 2013 at 1:08 PM, Linus Torvalds
torva...@linux-foundation.org wrote:
I still don't see what could be wrong with the pipe_inode_info thing,
but the fact that it's been so consistent in your traces does make me
Once more on this issue, because I've been unable to stop thinking
about it, and noticed that it's actually even subtler than I thought
initially.
On Mon, Dec 2, 2013 at 8:00 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
All our reference counting etc seems right, but we have one very
On Mon, Dec 02, 2013 at 06:58:57PM -0800, Linus Torvalds wrote:
In other words, it's unsafe to protect reference counts inside objects
with anything but spinlocks and/or atomic refcounts. Or you have to
have the lock *outside* the object you're protecting (which is often
what you want for
On Sat, Nov 30, 2013 at 1:43 AM, Simon Kirby wrote:
>
> I turned on kmalloc-192 tracing to find what else is using it: struct
> nfs_fh, struct bio, and struct cred. Poking around those, struct bio has
> bi_cnt, but it is way down in the struct. struct cred has "usage", but it
> comes first.
You
On Sat, Nov 30, 2013 at 09:25:33AM -0800, Linus Torvalds wrote:
> On Sat, Nov 30, 2013 at 1:43 AM, Simon Kirby wrote:
>
> > I turned on kmalloc-192 tracing to find what else is using it: struct
> > nfs_fh, struct bio, and struct cred. Poking around those, struct bio has
> > bi_cnt, but it is way
On Sat, Nov 30, 2013 at 1:43 AM, Simon Kirby wrote:
> On Tue, Nov 26, 2013 at 03:16:09PM -0800, Linus Torvalds wrote:
>>
>> The pipe-info structure isn't using its own slab cache, it's just
>> using "kmalloc()". So it by definition will merge with all other
>> kmalloc() allocations of the same
On Tue, Nov 26, 2013 at 03:16:09PM -0800, Linus Torvalds wrote:
> On Mon, Nov 25, 2013 at 4:44 PM, Simon Kirby wrote:
> >
> > I was hoping this or something else by 3.12 would have fixed it, so after
> > testing we deployed this everywhere and turned off the rest of the debug
> > options. I
On Tue, Nov 26, 2013 at 03:16:09PM -0800, Linus Torvalds wrote:
On Mon, Nov 25, 2013 at 4:44 PM, Simon Kirby s...@hostway.ca wrote:
I was hoping this or something else by 3.12 would have fixed it, so after
testing we deployed this everywhere and turned off the rest of the debug
options.
On Sat, Nov 30, 2013 at 1:43 AM, Simon Kirby s...@hostway.ca wrote:
On Tue, Nov 26, 2013 at 03:16:09PM -0800, Linus Torvalds wrote:
The pipe-info structure isn't using its own slab cache, it's just
using kmalloc(). So it by definition will merge with all other
kmalloc() allocations of the
On Sat, Nov 30, 2013 at 09:25:33AM -0800, Linus Torvalds wrote:
On Sat, Nov 30, 2013 at 1:43 AM, Simon Kirby s...@hostway.ca wrote:
I turned on kmalloc-192 tracing to find what else is using it: struct
nfs_fh, struct bio, and struct cred. Poking around those, struct bio has
bi_cnt, but it
On Sat, Nov 30, 2013 at 1:43 AM, Simon Kirby s...@hostway.ca wrote:
I turned on kmalloc-192 tracing to find what else is using it: struct
nfs_fh, struct bio, and struct cred. Poking around those, struct bio has
bi_cnt, but it is way down in the struct. struct cred has usage, but it
comes
On Tue, Nov 26, 2013 at 3:16 PM, Linus Torvalds
wrote:
>
> I'm really not very happy with the whole pipe locking logic (or the
> refcounting we do, separately from the "struct inode"), and in that
> sense I'm perfectly willing to blame that code for doing bad things.
> But the fact that it all
On Mon, Nov 25, 2013 at 4:44 PM, Simon Kirby wrote:
>
> I was hoping this or something else by 3.12 would have fixed it, so after
> testing we deployed this everywhere and turned off the rest of the debug
> options. I missed slub_debug on one server, though...and it just hit
> another case of
On Mon, Nov 25, 2013 at 4:44 PM, Simon Kirby s...@hostway.ca wrote:
I was hoping this or something else by 3.12 would have fixed it, so after
testing we deployed this everywhere and turned off the rest of the debug
options. I missed slub_debug on one server, though...and it just hit
another
On Tue, Nov 26, 2013 at 3:16 PM, Linus Torvalds
torva...@linux-foundation.org wrote:
I'm really not very happy with the whole pipe locking logic (or the
refcounting we do, separately from the struct inode), and in that
sense I'm perfectly willing to blame that code for doing bad things.
But
On Tue, Aug 20, 2013 at 12:51:11AM -0700, Ian Applegate wrote:
> Unfortunately no boxen with CONFIG_DEBUG_MUTEXES among them. I can
> enable on a few and should have some results within the day. These
> mainly serve (quite a bit of) HTTP/S cache traffic.
>
> On Tue, Aug 20, 2013 at 12:21 AM, Al
On Tue, Aug 20, 2013 at 12:51:11AM -0700, Ian Applegate wrote:
Unfortunately no boxen with CONFIG_DEBUG_MUTEXES among them. I can
enable on a few and should have some results within the day. These
mainly serve (quite a bit of) HTTP/S cache traffic.
On Tue, Aug 20, 2013 at 12:21 AM, Al Viro
On Mon, Aug 19, 2013 at 04:31:38PM -0700, Simon Kirby wrote:
> On Mon, Aug 19, 2013 at 05:24:41PM -0400, Chris Mason wrote:
>
> > Quoting Linus Torvalds (2013-08-19 17:16:36)
> > > On Mon, Aug 19, 2013 at 1:29 PM, Christoph Lameter
> > > wrote:
> > > > On Mon, 19 Aug 2013, Simon Kirby wrote:
>
On Mon, Aug 19, 2013 at 04:31:38PM -0700, Simon Kirby wrote:
On Mon, Aug 19, 2013 at 05:24:41PM -0400, Chris Mason wrote:
Quoting Linus Torvalds (2013-08-19 17:16:36)
On Mon, Aug 19, 2013 at 1:29 PM, Christoph Lameter c...@gentwo.org
wrote:
On Mon, 19 Aug 2013, Simon Kirby wrote:
Unfortunately no boxen with CONFIG_DEBUG_MUTEXES among them. I can
enable on a few and should have some results within the day. These
mainly serve (quite a bit of) HTTP/S cache traffic.
On Tue, Aug 20, 2013 at 12:21 AM, Al Viro wrote:
> On Tue, Aug 20, 2013 at 12:17:52AM -0700, Ian Applegate
On Tue, Aug 20, 2013 at 12:17:52AM -0700, Ian Applegate wrote:
> We are also seeing this or a similar issue. On a fairly widespread
> deployment of 3.10.1 & 3.10.6 this occurred fairly consistently on the
> order of 36 days (combined MTBF.)
Do you have any boxen with CONFIG_DEBUG_MUTEXES among
We are also seeing this or a similar issue. On a fairly widespread
deployment of 3.10.1 & 3.10.6 this occurred fairly consistently on the
order of 36 days (combined MTBF.)
[28974.739774] [ cut here ]
[28974.744980] kernel BUG at mm/slub.c:3352!
[28974.749502] invalid
We are also seeing this or a similar issue. On a fairly widespread
deployment of 3.10.1 3.10.6 this occurred fairly consistently on the
order of 36 days (combined MTBF.)
[28974.739774] [ cut here ]
[28974.744980] kernel BUG at mm/slub.c:3352!
[28974.749502] invalid
On Tue, Aug 20, 2013 at 12:17:52AM -0700, Ian Applegate wrote:
We are also seeing this or a similar issue. On a fairly widespread
deployment of 3.10.1 3.10.6 this occurred fairly consistently on the
order of 36 days (combined MTBF.)
Do you have any boxen with CONFIG_DEBUG_MUTEXES among those?
Unfortunately no boxen with CONFIG_DEBUG_MUTEXES among them. I can
enable on a few and should have some results within the day. These
mainly serve (quite a bit of) HTTP/S cache traffic.
On Tue, Aug 20, 2013 at 12:21 AM, Al Viro v...@zeniv.linux.org.uk wrote:
On Tue, Aug 20, 2013 at 12:17:52AM
On Mon, Aug 19, 2013 at 02:16:36PM -0700, Linus Torvalds wrote:
> On Mon, Aug 19, 2013 at 1:29 PM, Christoph Lameter wrote:
> > On Mon, 19 Aug 2013, Simon Kirby wrote:
> >
> >>[... ] The
> >> alloc/free traces are always the same -- always alloc_pipe_info and
> >> free_pipe_info. This is
On Mon, Aug 19, 2013 at 05:24:41PM -0400, Chris Mason wrote:
> Quoting Linus Torvalds (2013-08-19 17:16:36)
> > On Mon, Aug 19, 2013 at 1:29 PM, Christoph Lameter wrote:
> > > On Mon, 19 Aug 2013, Simon Kirby wrote:
> > >
> > >>[... ] The
> > >> alloc/free traces are always the same --
Quoting Linus Torvalds (2013-08-19 17:16:36)
> On Mon, Aug 19, 2013 at 1:29 PM, Christoph Lameter wrote:
> > On Mon, 19 Aug 2013, Simon Kirby wrote:
> >
> >>[... ] The
> >> alloc/free traces are always the same -- always alloc_pipe_info and
> >> free_pipe_info. This is seen on 3.10 and (now)
On Mon, Aug 19, 2013 at 1:29 PM, Christoph Lameter wrote:
> On Mon, 19 Aug 2013, Simon Kirby wrote:
>
>>[... ] The
>> alloc/free traces are always the same -- always alloc_pipe_info and
>> free_pipe_info. This is seen on 3.10 and (now) 3.11-rc4:
>>
>> Object 880090f19e78: 6b 6b 6b 6b 6c
On Mon, 19 Aug 2013, Simon Kirby wrote:
> Object 880090f19e78: 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> lkkk
This looks like an increment after free in the second 32 bit value of the
structure. First 32 bit value's poison is unchanged.
> CONFIG_EFENCE? :)
I think the
On Sat, Jul 06, 2013 at 11:27:38AM +0300, Pekka Enberg wrote:
> On Sat, Jul 6, 2013 at 3:09 AM, Simon Kirby wrote:
> > We saw two Oopses overnight on two separate boxes that seem possibly
> > related, but both are weird. These boxes typically run btrfs for rsync
> > snapshot backups (and usually
On Sat, Jul 06, 2013 at 11:27:38AM +0300, Pekka Enberg wrote:
On Sat, Jul 6, 2013 at 3:09 AM, Simon Kirby s...@hostway.ca wrote:
We saw two Oopses overnight on two separate boxes that seem possibly
related, but both are weird. These boxes typically run btrfs for rsync
snapshot backups (and
On Mon, 19 Aug 2013, Simon Kirby wrote:
Object 880090f19e78: 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
lkkk
This looks like an increment after free in the second 32 bit value of the
structure. First 32 bit value's poison is unchanged.
CONFIG_EFENCE? :)
I think the
On Mon, Aug 19, 2013 at 1:29 PM, Christoph Lameter c...@gentwo.org wrote:
On Mon, 19 Aug 2013, Simon Kirby wrote:
[... ] The
alloc/free traces are always the same -- always alloc_pipe_info and
free_pipe_info. This is seen on 3.10 and (now) 3.11-rc4:
Object 880090f19e78: 6b 6b 6b 6b
Quoting Linus Torvalds (2013-08-19 17:16:36)
On Mon, Aug 19, 2013 at 1:29 PM, Christoph Lameter c...@gentwo.org wrote:
On Mon, 19 Aug 2013, Simon Kirby wrote:
[... ] The
alloc/free traces are always the same -- always alloc_pipe_info and
free_pipe_info. This is seen on 3.10 and
On Mon, Aug 19, 2013 at 05:24:41PM -0400, Chris Mason wrote:
Quoting Linus Torvalds (2013-08-19 17:16:36)
On Mon, Aug 19, 2013 at 1:29 PM, Christoph Lameter c...@gentwo.org wrote:
On Mon, 19 Aug 2013, Simon Kirby wrote:
[... ] The
alloc/free traces are always the same -- always
On Mon, Aug 19, 2013 at 02:16:36PM -0700, Linus Torvalds wrote:
On Mon, Aug 19, 2013 at 1:29 PM, Christoph Lameter c...@gentwo.org wrote:
On Mon, 19 Aug 2013, Simon Kirby wrote:
[... ] The
alloc/free traces are always the same -- always alloc_pipe_info and
free_pipe_info. This is
On Sat, Jul 6, 2013 at 3:09 AM, Simon Kirby wrote:
> We saw two Oopses overnight on two separate boxes that seem possibly
> related, but both are weird. These boxes typically run btrfs for rsync
> snapshot backups (and usually Oops in btrfs ;), but not this time!
> backup02 was running 3.10-rc6
On Sat, Jul 6, 2013 at 3:09 AM, Simon Kirby s...@hostway.ca wrote:
We saw two Oopses overnight on two separate boxes that seem possibly
related, but both are weird. These boxes typically run btrfs for rsync
snapshot backups (and usually Oops in btrfs ;), but not this time!
backup02 was running
We saw two Oopses overnight on two separate boxes that seem possibly
related, but both are weird. These boxes typically run btrfs for rsync
snapshot backups (and usually Oops in btrfs ;), but not this time!
backup02 was running 3.10-rc6 plus btrfs-next at the time, and backup03
was running 3.10
We saw two Oopses overnight on two separate boxes that seem possibly
related, but both are weird. These boxes typically run btrfs for rsync
snapshot backups (and usually Oops in btrfs ;), but not this time!
backup02 was running 3.10-rc6 plus btrfs-next at the time, and backup03
was running 3.10
52 matches
Mail list logo