Re: [PATCH v3] fs: record task name which froze superblock

2015-03-09 Thread Alexey Dobriyan
On Mon, Mar 2, 2015 at 7:46 AM, Mateusz Guzik  wrote:
> On Mon, Mar 02, 2015 at 05:38:29AM +0100, Mateusz Guzik wrote:
>> On Sun, Mar 01, 2015 at 08:31:26AM +1100, Dave Chinner wrote:
>> > On Sat, Feb 28, 2015 at 05:25:57PM +0300, Alexey Dobriyan wrote:
>> > > Freezing and thawing are separate system calls, task which is supposed
>> > > to thaw filesystem/superblock can disappear due to crash or not thaw
>> > > due to a bug. At least record task name (we can't take task_struct
>> > > reference) to make support engineer's life easier.
>> > >
>> > > Hopefully 16 bytes per superblock isn't much.
>> > >
>> > > TASK_COMM_LEN definition (which is userspace ABI, see 
>> > > prctl(PR_SET_NAME)) is
>> > > moved to userspace exported header to not drag sched.h into every fs.h 
>> > > inclusion.
>> > >
>> > > Signed-off-by: Alexey Dobriyan 
>> >
>> > Freeze/thaw can be nested at the block level. That means the
>> > sb->s_writers.freeze_comm can point at the wrong process. i.e.
>> >
>> > Task A  Task B
>> > freeze_bdev
>> >   freeze_super
>> > freeze_comm = A
>> > freeze_bdev
>> > .
>> > thaw_bdev
>> >  
>> > 
>> >
>> > At this point, the block device will never be unthawed, but
>> > the debug field is now pointing to the wrong task. i.e. The debug
>> > helper has not recorded the process that is actually causing the
>> > problem, and leads us all off on a wild goose chase down the wrong
>> > path.
>> >
>> > IMO, debug code is only useful if it's reliable.
>> >
>>
>> It can be trivially modified to be very useful to support people.
>>
>> Actually this patch clears saved task name on unfreeze, so in this
>> particular scenario we would end up with no data.
>>
>> Freezer and unfreezer names don't even have to match, so there is not
>> much we can do here (e.g. recording all names in a linked list or
>> something is a non-starter because of this).
>>
>> I propose the following:
>> - on freezing:
>> 1. if 0->1 save the name
>> 2. if 1->2 have a flag to note there is an additional freezer
>> - on unfreezing
>> 1. if 1->0 clear the flag
>> 2. DO NOT clear the name in any case
>>
>
> Now that I sent this e-mail I realized we could actually keep a linked
> list of freezer names. Unfreezing would delete all elements when going
> 1->0, but would not touch it otherwise.

If you do linked list two processes constantly keeping superblock frozen
will allocate all the memory:

F
  F
  T
  F
  T
 ...

After all valid objections and comments I think the way to proceed is
a) record freeze_comm, never clear it (maybe record thaw_comm)
b) record "i am not the only one" flag
c) introduce debug_freeze which will "record" everything in dmesg
for situations when crashdump is not used but serial console is.

Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] fs: record task name which froze superblock

2015-03-09 Thread Alexey Dobriyan
On Mon, Mar 2, 2015 at 7:46 AM, Mateusz Guzik mgu...@redhat.com wrote:
 On Mon, Mar 02, 2015 at 05:38:29AM +0100, Mateusz Guzik wrote:
 On Sun, Mar 01, 2015 at 08:31:26AM +1100, Dave Chinner wrote:
  On Sat, Feb 28, 2015 at 05:25:57PM +0300, Alexey Dobriyan wrote:
   Freezing and thawing are separate system calls, task which is supposed
   to thaw filesystem/superblock can disappear due to crash or not thaw
   due to a bug. At least record task name (we can't take task_struct
   reference) to make support engineer's life easier.
  
   Hopefully 16 bytes per superblock isn't much.
  
   TASK_COMM_LEN definition (which is userspace ABI, see 
   prctl(PR_SET_NAME)) is
   moved to userspace exported header to not drag sched.h into every fs.h 
   inclusion.
  
   Signed-off-by: Alexey Dobriyan adobri...@gmail.com
 
  Freeze/thaw can be nested at the block level. That means the
  sb-s_writers.freeze_comm can point at the wrong process. i.e.
 
  Task A  Task B
  freeze_bdev
freeze_super
  freeze_comm = A
  freeze_bdev
  .
  thaw_bdev
   device still frozen
  crash
 
  At this point, the block device will never be unthawed, but
  the debug field is now pointing to the wrong task. i.e. The debug
  helper has not recorded the process that is actually causing the
  problem, and leads us all off on a wild goose chase down the wrong
  path.
 
  IMO, debug code is only useful if it's reliable.
 

 It can be trivially modified to be very useful to support people.

 Actually this patch clears saved task name on unfreeze, so in this
 particular scenario we would end up with no data.

 Freezer and unfreezer names don't even have to match, so there is not
 much we can do here (e.g. recording all names in a linked list or
 something is a non-starter because of this).

 I propose the following:
 - on freezing:
 1. if 0-1 save the name
 2. if 1-2 have a flag to note there is an additional freezer
 - on unfreezing
 1. if 1-0 clear the flag
 2. DO NOT clear the name in any case


 Now that I sent this e-mail I realized we could actually keep a linked
 list of freezer names. Unfreezing would delete all elements when going
 1-0, but would not touch it otherwise.

If you do linked list two processes constantly keeping superblock frozen
will allocate all the memory:

F
  F
  T
  F
  T
 ...

After all valid objections and comments I think the way to proceed is
a) record freeze_comm, never clear it (maybe record thaw_comm)
b) record i am not the only one flag
c) introduce debug_freeze which will record everything in dmesg
for situations when crashdump is not used but serial console is.

Alexey
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] fs: record task name which froze superblock

2015-03-04 Thread Alexey Dobriyan
On Mon, Mar 2, 2015 at 7:38 AM, Mateusz Guzik  wrote:

> As explained below, this one task name is already very useful and likely
> covers majority of real life use cases.
>
> While working in support we were getting a lot of vmcores where hung task
> detector panicked the kernel because a lot of tasks were blocked
> in UN state trying to write to frozen filesystems. I presume OP has
> similar story.

Yes, the intended "use" case is 1 freezer which hopefully covers
majority of bug reports.

> Some back on forth commuication almost always revealed one process e.g.
> freezing stuff and then blocking itself trying to access it. While we
> could see it blocked, we had no presumptive evidence to pin freezing on
> it. A matching name, while still not 100% conclusive, would be ok enough
> to push the case forward and avoid a rountrip of systemap scripts
> showing freezer process tree.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] fs: record task name which froze superblock

2015-03-04 Thread Alexey Dobriyan
On Mon, Mar 2, 2015 at 7:38 AM, Mateusz Guzik mgu...@redhat.com wrote:

 As explained below, this one task name is already very useful and likely
 covers majority of real life use cases.

 While working in support we were getting a lot of vmcores where hung task
 detector panicked the kernel because a lot of tasks were blocked
 in UN state trying to write to frozen filesystems. I presume OP has
 similar story.

Yes, the intended use case is 1 freezer which hopefully covers
majority of bug reports.

 Some back on forth commuication almost always revealed one process e.g.
 freezing stuff and then blocking itself trying to access it. While we
 could see it blocked, we had no presumptive evidence to pin freezing on
 it. A matching name, while still not 100% conclusive, would be ok enough
 to push the case forward and avoid a rountrip of systemap scripts
 showing freezer process tree.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] fs: record task name which froze superblock

2015-03-02 Thread Dave Chinner
On Mon, Mar 02, 2015 at 05:38:29AM +0100, Mateusz Guzik wrote:
> On Sun, Mar 01, 2015 at 08:31:26AM +1100, Dave Chinner wrote:
> > On Sat, Feb 28, 2015 at 05:25:57PM +0300, Alexey Dobriyan wrote:
> > > Freezing and thawing are separate system calls, task which is supposed
> > > to thaw filesystem/superblock can disappear due to crash or not thaw
> > > due to a bug. At least record task name (we can't take task_struct
> > > reference) to make support engineer's life easier.
> > > 
> > > Hopefully 16 bytes per superblock isn't much.
> > > 
> > > TASK_COMM_LEN definition (which is userspace ABI, see prctl(PR_SET_NAME)) 
> > > is
> > > moved to userspace exported header to not drag sched.h into every fs.h 
> > > inclusion.
> > > 
> > > Signed-off-by: Alexey Dobriyan 
> > 
> > Freeze/thaw can be nested at the block level. That means the
> > sb->s_writers.freeze_comm can point at the wrong process. i.e.
> > 
> > Task A  Task B
> > freeze_bdev
> >   freeze_super
> > freeze_comm = A
> > freeze_bdev
> > .
> > thaw_bdev
> >  
> > 
> > 
> > At this point, the block device will never be unthawed, but
> > the debug field is now pointing to the wrong task. i.e. The debug
> > helper has not recorded the process that is actually causing the
> > problem, and leads us all off on a wild goose chase down the wrong
> > path.
> > 
> > IMO, debug code is only useful if it's reliable.
> > 
> 
> It can be trivially modified to be very useful to support people.
> 
> Actually this patch clears saved task name on unfreeze, so in this
> particular scenario we would end up with no data.

It only clears it i thaw_super(), which is *not called* until the
last nested thaw_bdev() call is made.

When the system is hung what we actually need to know is who is
responsible for *thawing* the filesystem and then we can work out
why that hasn't run.  What this code tries to do is identify who
froze the filesystem and so indicate who *might* be responsible for
thawing it. If we mis-identify the agent who holds the freeze
status, then we fail to identify who needs to run the thaw and hence
we're still stuck not knowing WTF happened

I understand why you want to record this - I'm not arguing that we
shouldn't do this. My point is that we should *make it reliable* and
not in any way ambiguous, otherwise we failed to solve the problem
it was intended for.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] fs: record task name which froze superblock

2015-03-02 Thread Dave Chinner
On Mon, Mar 02, 2015 at 05:38:29AM +0100, Mateusz Guzik wrote:
 On Sun, Mar 01, 2015 at 08:31:26AM +1100, Dave Chinner wrote:
  On Sat, Feb 28, 2015 at 05:25:57PM +0300, Alexey Dobriyan wrote:
   Freezing and thawing are separate system calls, task which is supposed
   to thaw filesystem/superblock can disappear due to crash or not thaw
   due to a bug. At least record task name (we can't take task_struct
   reference) to make support engineer's life easier.
   
   Hopefully 16 bytes per superblock isn't much.
   
   TASK_COMM_LEN definition (which is userspace ABI, see prctl(PR_SET_NAME)) 
   is
   moved to userspace exported header to not drag sched.h into every fs.h 
   inclusion.
   
   Signed-off-by: Alexey Dobriyan adobri...@gmail.com
  
  Freeze/thaw can be nested at the block level. That means the
  sb-s_writers.freeze_comm can point at the wrong process. i.e.
  
  Task A  Task B
  freeze_bdev
freeze_super
  freeze_comm = A
  freeze_bdev
  .
  thaw_bdev
   device still frozen
  crash
  
  At this point, the block device will never be unthawed, but
  the debug field is now pointing to the wrong task. i.e. The debug
  helper has not recorded the process that is actually causing the
  problem, and leads us all off on a wild goose chase down the wrong
  path.
  
  IMO, debug code is only useful if it's reliable.
  
 
 It can be trivially modified to be very useful to support people.
 
 Actually this patch clears saved task name on unfreeze, so in this
 particular scenario we would end up with no data.

It only clears it i thaw_super(), which is *not called* until the
last nested thaw_bdev() call is made.

When the system is hung what we actually need to know is who is
responsible for *thawing* the filesystem and then we can work out
why that hasn't run.  What this code tries to do is identify who
froze the filesystem and so indicate who *might* be responsible for
thawing it. If we mis-identify the agent who holds the freeze
status, then we fail to identify who needs to run the thaw and hence
we're still stuck not knowing WTF happened

I understand why you want to record this - I'm not arguing that we
shouldn't do this. My point is that we should *make it reliable* and
not in any way ambiguous, otherwise we failed to solve the problem
it was intended for.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] fs: record task name which froze superblock

2015-03-01 Thread Mateusz Guzik
On Mon, Mar 02, 2015 at 05:38:29AM +0100, Mateusz Guzik wrote:
> On Sun, Mar 01, 2015 at 08:31:26AM +1100, Dave Chinner wrote:
> > On Sat, Feb 28, 2015 at 05:25:57PM +0300, Alexey Dobriyan wrote:
> > > Freezing and thawing are separate system calls, task which is supposed
> > > to thaw filesystem/superblock can disappear due to crash or not thaw
> > > due to a bug. At least record task name (we can't take task_struct
> > > reference) to make support engineer's life easier.
> > > 
> > > Hopefully 16 bytes per superblock isn't much.
> > > 
> > > TASK_COMM_LEN definition (which is userspace ABI, see prctl(PR_SET_NAME)) 
> > > is
> > > moved to userspace exported header to not drag sched.h into every fs.h 
> > > inclusion.
> > > 
> > > Signed-off-by: Alexey Dobriyan 
> > 
> > Freeze/thaw can be nested at the block level. That means the
> > sb->s_writers.freeze_comm can point at the wrong process. i.e.
> > 
> > Task A  Task B
> > freeze_bdev
> >   freeze_super
> > freeze_comm = A
> > freeze_bdev
> > .
> > thaw_bdev
> >  
> > 
> > 
> > At this point, the block device will never be unthawed, but
> > the debug field is now pointing to the wrong task. i.e. The debug
> > helper has not recorded the process that is actually causing the
> > problem, and leads us all off on a wild goose chase down the wrong
> > path.
> > 
> > IMO, debug code is only useful if it's reliable.
> > 
> 
> It can be trivially modified to be very useful to support people.
> 
> Actually this patch clears saved task name on unfreeze, so in this
> particular scenario we would end up with no data.
> 
> Freezer and unfreezer names don't even have to match, so there is not
> much we can do here (e.g. recording all names in a linked list or
> something is a non-starter because of this).
> 
> I propose the following:
> - on freezing:
> 1. if 0->1 save the name
> 2. if 1->2 have a flag to note there is an additional freezer
> - on unfreezing
> 1. if 1->0 clear the flag
> 2. DO NOT clear the name in any case
> 

Now that I sent this e-mail I realized we could actually keep a linked
list of freezer names. Unfreezing would delete all elements when going
1->0, but would not touch it otherwise.

This would cover a less likely use case though, so I would be fine
either way FWIW.

Just my $0,03.

> This way we keep the name for possible future reference and we know
> whether something with this name was the sole freezer in this cycle.
> 
> As explained below, this one task name is already very useful and likely
> covers majority of real life use cases.
> 
> While working in support we were getting a lot of vmcores where hung task
> detector panicked the kernel because a lot of tasks were blocked
> in UN state trying to write to frozen filesystems. I presume OP has
> similar story.
> 
> Some back on forth commuication almost always revealed one process e.g.
> freezing stuff and then blocking itself trying to access it. While we
> could see it blocked, we had no presumptive evidence to pin freezing on
> it. A matching name, while still not 100% conclusive, would be ok enough
> to push the case forward and avoid a rountrip of systemap scripts
> showing freezer process tree.
> 

-- 
Mateusz Guzik
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] fs: record task name which froze superblock

2015-03-01 Thread Mateusz Guzik
On Sun, Mar 01, 2015 at 08:31:26AM +1100, Dave Chinner wrote:
> On Sat, Feb 28, 2015 at 05:25:57PM +0300, Alexey Dobriyan wrote:
> > Freezing and thawing are separate system calls, task which is supposed
> > to thaw filesystem/superblock can disappear due to crash or not thaw
> > due to a bug. At least record task name (we can't take task_struct
> > reference) to make support engineer's life easier.
> > 
> > Hopefully 16 bytes per superblock isn't much.
> > 
> > TASK_COMM_LEN definition (which is userspace ABI, see prctl(PR_SET_NAME)) is
> > moved to userspace exported header to not drag sched.h into every fs.h 
> > inclusion.
> > 
> > Signed-off-by: Alexey Dobriyan 
> 
> Freeze/thaw can be nested at the block level. That means the
> sb->s_writers.freeze_comm can point at the wrong process. i.e.
> 
> Task ATask B
> freeze_bdev
>   freeze_super
> freeze_comm = A
>   freeze_bdev
> .
> thaw_bdev
>  
>   
> 
> At this point, the block device will never be unthawed, but
> the debug field is now pointing to the wrong task. i.e. The debug
> helper has not recorded the process that is actually causing the
> problem, and leads us all off on a wild goose chase down the wrong
> path.
> 
> IMO, debug code is only useful if it's reliable.
> 

It can be trivially modified to be very useful to support people.

Actually this patch clears saved task name on unfreeze, so in this
particular scenario we would end up with no data.

Freezer and unfreezer names don't even have to match, so there is not
much we can do here (e.g. recording all names in a linked list or
something is a non-starter because of this).

I propose the following:
- on freezing:
1. if 0->1 save the name
2. if 1->2 have a flag to note there is an additional freezer
- on unfreezing
1. if 1->0 clear the flag
2. DO NOT clear the name in any case

This way we keep the name for possible future reference and we know
whether something with this name was the sole freezer in this cycle.

As explained below, this one task name is already very useful and likely
covers majority of real life use cases.

While working in support we were getting a lot of vmcores where hung task
detector panicked the kernel because a lot of tasks were blocked
in UN state trying to write to frozen filesystems. I presume OP has
similar story.

Some back on forth commuication almost always revealed one process e.g.
freezing stuff and then blocking itself trying to access it. While we
could see it blocked, we had no presumptive evidence to pin freezing on
it. A matching name, while still not 100% conclusive, would be ok enough
to push the case forward and avoid a rountrip of systemap scripts
showing freezer process tree.

-- 
Mateusz Guzik
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] fs: record task name which froze superblock

2015-03-01 Thread Mateusz Guzik
On Sun, Mar 01, 2015 at 08:31:26AM +1100, Dave Chinner wrote:
 On Sat, Feb 28, 2015 at 05:25:57PM +0300, Alexey Dobriyan wrote:
  Freezing and thawing are separate system calls, task which is supposed
  to thaw filesystem/superblock can disappear due to crash or not thaw
  due to a bug. At least record task name (we can't take task_struct
  reference) to make support engineer's life easier.
  
  Hopefully 16 bytes per superblock isn't much.
  
  TASK_COMM_LEN definition (which is userspace ABI, see prctl(PR_SET_NAME)) is
  moved to userspace exported header to not drag sched.h into every fs.h 
  inclusion.
  
  Signed-off-by: Alexey Dobriyan adobri...@gmail.com
 
 Freeze/thaw can be nested at the block level. That means the
 sb-s_writers.freeze_comm can point at the wrong process. i.e.
 
 Task ATask B
 freeze_bdev
   freeze_super
 freeze_comm = A
   freeze_bdev
 .
 thaw_bdev
  device still frozen
   crash
 
 At this point, the block device will never be unthawed, but
 the debug field is now pointing to the wrong task. i.e. The debug
 helper has not recorded the process that is actually causing the
 problem, and leads us all off on a wild goose chase down the wrong
 path.
 
 IMO, debug code is only useful if it's reliable.
 

It can be trivially modified to be very useful to support people.

Actually this patch clears saved task name on unfreeze, so in this
particular scenario we would end up with no data.

Freezer and unfreezer names don't even have to match, so there is not
much we can do here (e.g. recording all names in a linked list or
something is a non-starter because of this).

I propose the following:
- on freezing:
1. if 0-1 save the name
2. if 1-2 have a flag to note there is an additional freezer
- on unfreezing
1. if 1-0 clear the flag
2. DO NOT clear the name in any case

This way we keep the name for possible future reference and we know
whether something with this name was the sole freezer in this cycle.

As explained below, this one task name is already very useful and likely
covers majority of real life use cases.

While working in support we were getting a lot of vmcores where hung task
detector panicked the kernel because a lot of tasks were blocked
in UN state trying to write to frozen filesystems. I presume OP has
similar story.

Some back on forth commuication almost always revealed one process e.g.
freezing stuff and then blocking itself trying to access it. While we
could see it blocked, we had no presumptive evidence to pin freezing on
it. A matching name, while still not 100% conclusive, would be ok enough
to push the case forward and avoid a rountrip of systemap scripts
showing freezer process tree.

-- 
Mateusz Guzik
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] fs: record task name which froze superblock

2015-03-01 Thread Mateusz Guzik
On Mon, Mar 02, 2015 at 05:38:29AM +0100, Mateusz Guzik wrote:
 On Sun, Mar 01, 2015 at 08:31:26AM +1100, Dave Chinner wrote:
  On Sat, Feb 28, 2015 at 05:25:57PM +0300, Alexey Dobriyan wrote:
   Freezing and thawing are separate system calls, task which is supposed
   to thaw filesystem/superblock can disappear due to crash or not thaw
   due to a bug. At least record task name (we can't take task_struct
   reference) to make support engineer's life easier.
   
   Hopefully 16 bytes per superblock isn't much.
   
   TASK_COMM_LEN definition (which is userspace ABI, see prctl(PR_SET_NAME)) 
   is
   moved to userspace exported header to not drag sched.h into every fs.h 
   inclusion.
   
   Signed-off-by: Alexey Dobriyan adobri...@gmail.com
  
  Freeze/thaw can be nested at the block level. That means the
  sb-s_writers.freeze_comm can point at the wrong process. i.e.
  
  Task A  Task B
  freeze_bdev
freeze_super
  freeze_comm = A
  freeze_bdev
  .
  thaw_bdev
   device still frozen
  crash
  
  At this point, the block device will never be unthawed, but
  the debug field is now pointing to the wrong task. i.e. The debug
  helper has not recorded the process that is actually causing the
  problem, and leads us all off on a wild goose chase down the wrong
  path.
  
  IMO, debug code is only useful if it's reliable.
  
 
 It can be trivially modified to be very useful to support people.
 
 Actually this patch clears saved task name on unfreeze, so in this
 particular scenario we would end up with no data.
 
 Freezer and unfreezer names don't even have to match, so there is not
 much we can do here (e.g. recording all names in a linked list or
 something is a non-starter because of this).
 
 I propose the following:
 - on freezing:
 1. if 0-1 save the name
 2. if 1-2 have a flag to note there is an additional freezer
 - on unfreezing
 1. if 1-0 clear the flag
 2. DO NOT clear the name in any case
 

Now that I sent this e-mail I realized we could actually keep a linked
list of freezer names. Unfreezing would delete all elements when going
1-0, but would not touch it otherwise.

This would cover a less likely use case though, so I would be fine
either way FWIW.

Just my $0,03.

 This way we keep the name for possible future reference and we know
 whether something with this name was the sole freezer in this cycle.
 
 As explained below, this one task name is already very useful and likely
 covers majority of real life use cases.
 
 While working in support we were getting a lot of vmcores where hung task
 detector panicked the kernel because a lot of tasks were blocked
 in UN state trying to write to frozen filesystems. I presume OP has
 similar story.
 
 Some back on forth commuication almost always revealed one process e.g.
 freezing stuff and then blocking itself trying to access it. While we
 could see it blocked, we had no presumptive evidence to pin freezing on
 it. A matching name, while still not 100% conclusive, would be ok enough
 to push the case forward and avoid a rountrip of systemap scripts
 showing freezer process tree.
 

-- 
Mateusz Guzik
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] fs: record task name which froze superblock

2015-02-28 Thread Dave Chinner
On Sat, Feb 28, 2015 at 05:25:57PM +0300, Alexey Dobriyan wrote:
> Freezing and thawing are separate system calls, task which is supposed
> to thaw filesystem/superblock can disappear due to crash or not thaw
> due to a bug. At least record task name (we can't take task_struct
> reference) to make support engineer's life easier.
> 
> Hopefully 16 bytes per superblock isn't much.
> 
> TASK_COMM_LEN definition (which is userspace ABI, see prctl(PR_SET_NAME)) is
> moved to userspace exported header to not drag sched.h into every fs.h 
> inclusion.
> 
> Signed-off-by: Alexey Dobriyan 

Freeze/thaw can be nested at the block level. That means the
sb->s_writers.freeze_comm can point at the wrong process. i.e.

Task A  Task B
freeze_bdev
  freeze_super
freeze_comm = A
freeze_bdev
.
thaw_bdev
 


At this point, the block device will never be unthawed, but
the debug field is now pointing to the wrong task. i.e. The debug
helper has not recorded the process that is actually causing the
problem, and leads us all off on a wild goose chase down the wrong
path.

IMO, debug code is only useful if it's reliable.

> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -303,9 +303,6 @@ extern char ___assert_task_state[1 - 2*!!(
>  
>  #endif
>  
> -/* Task command name length */
> -#define TASK_COMM_LEN 16
> -
>  #include 
>  
>  /*
> --- a/include/uapi/linux/sched.h
> +++ b/include/uapi/linux/sched.h
> @@ -49,4 +49,7 @@
>   */
>  #define SCHED_FLAG_RESET_ON_FORK 0x01
>  
> +/* Task command name length */
> +#define TASK_COMM_LEN 16
> +
>  #endif /* _UAPI_LINUX_SCHED_H */

That should be a separate patch, sent to the scheduler maintainers
for review. AFAICT, it isn't part of the user API - it's not defined
in the man page which just says "can be up to 16 bytes".

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3] fs: record task name which froze superblock

2015-02-28 Thread Alexey Dobriyan
Freezing and thawing are separate system calls, task which is supposed
to thaw filesystem/superblock can disappear due to crash or not thaw
due to a bug. At least record task name (we can't take task_struct
reference) to make support engineer's life easier.

Hopefully 16 bytes per superblock isn't much.

TASK_COMM_LEN definition (which is userspace ABI, see prctl(PR_SET_NAME)) is
moved to userspace exported header to not drag sched.h into every fs.h 
inclusion.

Signed-off-by: Alexey Dobriyan 
---

 fs/block_dev.c |   12 
 fs/ioctl.c |   22 --
 fs/super.c |2 ++
 include/linux/fs.h |6 ++
 include/linux/sched.h  |3 ---
 include/uapi/linux/sched.h |3 +++
 6 files changed, 35 insertions(+), 13 deletions(-)

--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -227,9 +227,11 @@ struct super_block *freeze_bdev(struct block_device *bdev)
sb = get_active_super(bdev);
if (!sb)
goto out;
-   if (sb->s_op->freeze_super)
+   if (sb->s_op->freeze_super) {
error = sb->s_op->freeze_super(sb);
-   else
+   if (error == 0)
+   get_task_comm(sb->s_writers.freeze_comm, current);
+   } else
error = freeze_super(sb);
if (error) {
deactivate_super(sb);
@@ -267,9 +269,11 @@ int thaw_bdev(struct block_device *bdev, struct 
super_block *sb)
if (!sb)
goto out;
 
-   if (sb->s_op->thaw_super)
+   if (sb->s_op->thaw_super) {
error = sb->s_op->thaw_super(sb);
-   else
+   if (error == 0)
+   memset(sb->s_writers.freeze_comm, 0, TASK_COMM_LEN);
+   } else
error = thaw_super(sb);
if (error) {
bdev->bd_fsfreeze_count++;
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -518,6 +518,7 @@ static int ioctl_fioasync(unsigned int fd, struct file 
*filp,
 static int ioctl_fsfreeze(struct file *filp)
 {
struct super_block *sb = file_inode(filp)->i_sb;
+   int rv;
 
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -527,22 +528,31 @@ static int ioctl_fsfreeze(struct file *filp)
return -EOPNOTSUPP;
 
/* Freeze */
-   if (sb->s_op->freeze_super)
-   return sb->s_op->freeze_super(sb);
-   return freeze_super(sb);
+   if (sb->s_op->freeze_super) {
+   rv = sb->s_op->freeze_super(sb);
+   if (rv == 0)
+   get_task_comm(sb->s_writers.freeze_comm, current);
+   } else
+   rv = freeze_super(sb);
+   return rv;
 }
 
 static int ioctl_fsthaw(struct file *filp)
 {
struct super_block *sb = file_inode(filp)->i_sb;
+   int rv;
 
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
 
/* Thaw */
-   if (sb->s_op->thaw_super)
-   return sb->s_op->thaw_super(sb);
-   return thaw_super(sb);
+   if (sb->s_op->thaw_super) {
+   rv = sb->s_op->thaw_super(sb);
+   if (rv == 0)
+   memset(sb->s_writers.freeze_comm, 0, TASK_COMM_LEN);
+   } else
+   rv = thaw_super(sb);
+   return rv;
 }
 
 /*
--- a/fs/super.c
+++ b/fs/super.c
@@ -1351,6 +1351,7 @@ int freeze_super(struct super_block *sb)
 * sees write activity when frozen is set to SB_FREEZE_COMPLETE.
 */
sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+   get_task_comm(sb->s_writers.freeze_comm, current);
up_write(>s_umount);
return 0;
 }
@@ -1387,6 +1388,7 @@ int thaw_super(struct super_block *sb)
 
 out:
sb->s_writers.frozen = SB_UNFROZEN;
+   memset(sb->s_writers.freeze_comm, 0, TASK_COMM_LEN);
smp_wmb();
wake_up(>s_writers.wait_unfrozen);
deactivate_locked_super(sb);
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -33,6 +33,7 @@
 
 #include 
 #include 
+#include 
 
 struct backing_dev_info;
 struct export_operations;
@@ -1217,6 +1218,11 @@ struct sb_writers {
int frozen; /* Is sb frozen? */
wait_queue_head_t   wait_unfrozen;  /* queue for waiting for
   sb to be thawed */
+   /*
+* who froze superblock
+* write-only field for traces in crashdump
+*/
+   charfreeze_comm[TASK_COMM_LEN];
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map  lock_map[SB_FREEZE_LEVELS];
 #endif
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -303,9 +303,6 @@ extern char ___assert_task_state[1 - 2*!!(
 
 #endif
 
-/* Task command name length */
-#define TASK_COMM_LEN 16
-
 #include 
 
 /*
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -49,4 +49,7 @@
  */
 #define SCHED_FLAG_RESET_ON_FORK   0x01
 
+/* Task command name length */
+#define TASK_COMM_LEN 16
+
 

[PATCH v3] fs: record task name which froze superblock

2015-02-28 Thread Alexey Dobriyan
Freezing and thawing are separate system calls, task which is supposed
to thaw filesystem/superblock can disappear due to crash or not thaw
due to a bug. At least record task name (we can't take task_struct
reference) to make support engineer's life easier.

Hopefully 16 bytes per superblock isn't much.

TASK_COMM_LEN definition (which is userspace ABI, see prctl(PR_SET_NAME)) is
moved to userspace exported header to not drag sched.h into every fs.h 
inclusion.

Signed-off-by: Alexey Dobriyan adobri...@gmail.com
---

 fs/block_dev.c |   12 
 fs/ioctl.c |   22 --
 fs/super.c |2 ++
 include/linux/fs.h |6 ++
 include/linux/sched.h  |3 ---
 include/uapi/linux/sched.h |3 +++
 6 files changed, 35 insertions(+), 13 deletions(-)

--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -227,9 +227,11 @@ struct super_block *freeze_bdev(struct block_device *bdev)
sb = get_active_super(bdev);
if (!sb)
goto out;
-   if (sb-s_op-freeze_super)
+   if (sb-s_op-freeze_super) {
error = sb-s_op-freeze_super(sb);
-   else
+   if (error == 0)
+   get_task_comm(sb-s_writers.freeze_comm, current);
+   } else
error = freeze_super(sb);
if (error) {
deactivate_super(sb);
@@ -267,9 +269,11 @@ int thaw_bdev(struct block_device *bdev, struct 
super_block *sb)
if (!sb)
goto out;
 
-   if (sb-s_op-thaw_super)
+   if (sb-s_op-thaw_super) {
error = sb-s_op-thaw_super(sb);
-   else
+   if (error == 0)
+   memset(sb-s_writers.freeze_comm, 0, TASK_COMM_LEN);
+   } else
error = thaw_super(sb);
if (error) {
bdev-bd_fsfreeze_count++;
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -518,6 +518,7 @@ static int ioctl_fioasync(unsigned int fd, struct file 
*filp,
 static int ioctl_fsfreeze(struct file *filp)
 {
struct super_block *sb = file_inode(filp)-i_sb;
+   int rv;
 
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -527,22 +528,31 @@ static int ioctl_fsfreeze(struct file *filp)
return -EOPNOTSUPP;
 
/* Freeze */
-   if (sb-s_op-freeze_super)
-   return sb-s_op-freeze_super(sb);
-   return freeze_super(sb);
+   if (sb-s_op-freeze_super) {
+   rv = sb-s_op-freeze_super(sb);
+   if (rv == 0)
+   get_task_comm(sb-s_writers.freeze_comm, current);
+   } else
+   rv = freeze_super(sb);
+   return rv;
 }
 
 static int ioctl_fsthaw(struct file *filp)
 {
struct super_block *sb = file_inode(filp)-i_sb;
+   int rv;
 
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
 
/* Thaw */
-   if (sb-s_op-thaw_super)
-   return sb-s_op-thaw_super(sb);
-   return thaw_super(sb);
+   if (sb-s_op-thaw_super) {
+   rv = sb-s_op-thaw_super(sb);
+   if (rv == 0)
+   memset(sb-s_writers.freeze_comm, 0, TASK_COMM_LEN);
+   } else
+   rv = thaw_super(sb);
+   return rv;
 }
 
 /*
--- a/fs/super.c
+++ b/fs/super.c
@@ -1351,6 +1351,7 @@ int freeze_super(struct super_block *sb)
 * sees write activity when frozen is set to SB_FREEZE_COMPLETE.
 */
sb-s_writers.frozen = SB_FREEZE_COMPLETE;
+   get_task_comm(sb-s_writers.freeze_comm, current);
up_write(sb-s_umount);
return 0;
 }
@@ -1387,6 +1388,7 @@ int thaw_super(struct super_block *sb)
 
 out:
sb-s_writers.frozen = SB_UNFROZEN;
+   memset(sb-s_writers.freeze_comm, 0, TASK_COMM_LEN);
smp_wmb();
wake_up(sb-s_writers.wait_unfrozen);
deactivate_locked_super(sb);
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -33,6 +33,7 @@
 
 #include asm/byteorder.h
 #include uapi/linux/fs.h
+#include uapi/linux/sched.h
 
 struct backing_dev_info;
 struct export_operations;
@@ -1217,6 +1218,11 @@ struct sb_writers {
int frozen; /* Is sb frozen? */
wait_queue_head_t   wait_unfrozen;  /* queue for waiting for
   sb to be thawed */
+   /*
+* who froze superblock
+* write-only field for traces in crashdump
+*/
+   charfreeze_comm[TASK_COMM_LEN];
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map  lock_map[SB_FREEZE_LEVELS];
 #endif
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -303,9 +303,6 @@ extern char ___assert_task_state[1 - 2*!!(
 
 #endif
 
-/* Task command name length */
-#define TASK_COMM_LEN 16
-
 #include linux/spinlock.h
 
 /*
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -49,4 +49,7 @@
  */
 #define SCHED_FLAG_RESET_ON_FORK   0x01
 
+/* Task command 

Re: [PATCH v3] fs: record task name which froze superblock

2015-02-28 Thread Dave Chinner
On Sat, Feb 28, 2015 at 05:25:57PM +0300, Alexey Dobriyan wrote:
 Freezing and thawing are separate system calls, task which is supposed
 to thaw filesystem/superblock can disappear due to crash or not thaw
 due to a bug. At least record task name (we can't take task_struct
 reference) to make support engineer's life easier.
 
 Hopefully 16 bytes per superblock isn't much.
 
 TASK_COMM_LEN definition (which is userspace ABI, see prctl(PR_SET_NAME)) is
 moved to userspace exported header to not drag sched.h into every fs.h 
 inclusion.
 
 Signed-off-by: Alexey Dobriyan adobri...@gmail.com

Freeze/thaw can be nested at the block level. That means the
sb-s_writers.freeze_comm can point at the wrong process. i.e.

Task A  Task B
freeze_bdev
  freeze_super
freeze_comm = A
freeze_bdev
.
thaw_bdev
 device still frozen
crash

At this point, the block device will never be unthawed, but
the debug field is now pointing to the wrong task. i.e. The debug
helper has not recorded the process that is actually causing the
problem, and leads us all off on a wild goose chase down the wrong
path.

IMO, debug code is only useful if it's reliable.

 --- a/include/linux/sched.h
 +++ b/include/linux/sched.h
 @@ -303,9 +303,6 @@ extern char ___assert_task_state[1 - 2*!!(
  
  #endif
  
 -/* Task command name length */
 -#define TASK_COMM_LEN 16
 -
  #include linux/spinlock.h
  
  /*
 --- a/include/uapi/linux/sched.h
 +++ b/include/uapi/linux/sched.h
 @@ -49,4 +49,7 @@
   */
  #define SCHED_FLAG_RESET_ON_FORK 0x01
  
 +/* Task command name length */
 +#define TASK_COMM_LEN 16
 +
  #endif /* _UAPI_LINUX_SCHED_H */

That should be a separate patch, sent to the scheduler maintainers
for review. AFAICT, it isn't part of the user API - it's not defined
in the man page which just says can be up to 16 bytes.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/