Re: [PATCH] writeback: safer lock nesting

2018-04-10 Thread Greg Thelen
On Tue, Apr 10, 2018 at 7:44 PM Wang Long  wrote:

> > Hi,
> >
> > [This is an automated email]
> >
> > This commit has been processed by the -stable helper bot and determined
> > to be a high probability candidate for -stable trees. (score: 44.5575)
> >
> > The bot has tested the following trees: v4.16.1, v4.15.16, v4.14.33,
v4.9.93, v4.4.127.
> >
> > v4.16.1: Build OK!
> > v4.15.16: Build OK!
> > v4.14.33: Build OK!
> > v4.9.93: Build OK!
> > v4.4.127: Failed to apply! Possible dependencies:
> >  62cccb8c8e7a ("mm: simplify lock_page_memcg()")
> Hi Sasha,

> I test the memory cgroup in lts v4.4, for this issue, 62cccb8c8e7a ("mm:
> simplify lock_page_memcg()")
> need to adjust and there are several other places that need to be fixed.

> I will make the patch for lts v4.4 if no one did.

I'm testing a 4.4-stable patch right now.  ETA is a few hours.


Re: [PATCH] writeback: safer lock nesting

2018-04-10 Thread Greg Thelen
On Tue, Apr 10, 2018 at 7:44 PM Wang Long  wrote:

> > Hi,
> >
> > [This is an automated email]
> >
> > This commit has been processed by the -stable helper bot and determined
> > to be a high probability candidate for -stable trees. (score: 44.5575)
> >
> > The bot has tested the following trees: v4.16.1, v4.15.16, v4.14.33,
v4.9.93, v4.4.127.
> >
> > v4.16.1: Build OK!
> > v4.15.16: Build OK!
> > v4.14.33: Build OK!
> > v4.9.93: Build OK!
> > v4.4.127: Failed to apply! Possible dependencies:
> >  62cccb8c8e7a ("mm: simplify lock_page_memcg()")
> Hi Sasha,

> I test the memory cgroup in lts v4.4, for this issue, 62cccb8c8e7a ("mm:
> simplify lock_page_memcg()")
> need to adjust and there are several other places that need to be fixed.

> I will make the patch for lts v4.4 if no one did.

I'm testing a 4.4-stable patch right now.  ETA is a few hours.


Re: [PATCH] writeback: safer lock nesting

2018-04-10 Thread Wang Long



Hi,

[This is an automated email]

This commit has been processed by the -stable helper bot and determined
to be a high probability candidate for -stable trees. (score: 44.5575)

The bot has tested the following trees: v4.16.1, v4.15.16, v4.14.33, v4.9.93, 
v4.4.127.

v4.16.1: Build OK!
v4.15.16: Build OK!
v4.14.33: Build OK!
v4.9.93: Build OK!
v4.4.127: Failed to apply! Possible dependencies:
 62cccb8c8e7a ("mm: simplify lock_page_memcg()")

Hi Sasha,

I test the memory cgroup in lts v4.4, for this issue, 62cccb8c8e7a ("mm: 
simplify lock_page_memcg()")

need to adjust and there are several other places that need to be fixed.

I will make the patch for lts v4.4 if no one did.

Thanks.



Please let us know if you'd like to have this patch included in a stable tree.

--
Thanks,
Sasha




Re: [PATCH] writeback: safer lock nesting

2018-04-10 Thread Wang Long



Hi,

[This is an automated email]

This commit has been processed by the -stable helper bot and determined
to be a high probability candidate for -stable trees. (score: 44.5575)

The bot has tested the following trees: v4.16.1, v4.15.16, v4.14.33, v4.9.93, 
v4.4.127.

v4.16.1: Build OK!
v4.15.16: Build OK!
v4.14.33: Build OK!
v4.9.93: Build OK!
v4.4.127: Failed to apply! Possible dependencies:
 62cccb8c8e7a ("mm: simplify lock_page_memcg()")

Hi Sasha,

I test the memory cgroup in lts v4.4, for this issue, 62cccb8c8e7a ("mm: 
simplify lock_page_memcg()")

need to adjust and there are several other places that need to be fixed.

I will make the patch for lts v4.4 if no one did.

Thanks.



Please let us know if you'd like to have this patch included in a stable tree.

--
Thanks,
Sasha




Re: [PATCH] writeback: safer lock nesting

2018-04-10 Thread Sasha Levin
Hi,

[This is an automated email]

This commit has been processed by the -stable helper bot and determined
to be a high probability candidate for -stable trees. (score: 44.5575)

The bot has tested the following trees: v4.16.1, v4.15.16, v4.14.33, v4.9.93, 
v4.4.127.

v4.16.1: Build OK!
v4.15.16: Build OK!
v4.14.33: Build OK!
v4.9.93: Build OK!
v4.4.127: Failed to apply! Possible dependencies:
62cccb8c8e7a ("mm: simplify lock_page_memcg()")


Please let us know if you'd like to have this patch included in a stable tree.

--
Thanks,
Sasha

Re: [PATCH] writeback: safer lock nesting

2018-04-10 Thread Sasha Levin
Hi,

[This is an automated email]

This commit has been processed by the -stable helper bot and determined
to be a high probability candidate for -stable trees. (score: 44.5575)

The bot has tested the following trees: v4.16.1, v4.15.16, v4.14.33, v4.9.93, 
v4.4.127.

v4.16.1: Build OK!
v4.15.16: Build OK!
v4.14.33: Build OK!
v4.9.93: Build OK!
v4.4.127: Failed to apply! Possible dependencies:
62cccb8c8e7a ("mm: simplify lock_page_memcg()")


Please let us know if you'd like to have this patch included in a stable tree.

--
Thanks,
Sasha

Re: [PATCH] writeback: safer lock nesting

2018-04-06 Thread Greg Thelen
On Fri, Apr 6, 2018 at 1:07 AM Michal Hocko  wrote:

> On Fri 06-04-18 01:03:24, Greg Thelen wrote:
> [...]
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index d4d04fee568a..d51bae5a53e2 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -746,10 +746,11 @@ int inode_congested(struct inode *inode, int
cong_bits)
> >   if (inode && inode_to_wb_is_valid(inode)) {
> >   struct bdi_writeback *wb;
> >   bool locked, congested;
> > + unsigned long flags;
> >
> > - wb = unlocked_inode_to_wb_begin(inode, );
> > + wb = unlocked_inode_to_wb_begin(inode, , );

> Wouldn't it be better to have a cookie (struct) rather than 2 parameters
> and let unlocked_inode_to_wb_end DTRT?

Nod.  I'll post a V2 patch with that change.

> >   congested = wb_congested(wb, cong_bits);
> > - unlocked_inode_to_wb_end(inode, locked);
> > + unlocked_inode_to_wb_end(inode, locked, flags);
> >   return congested;
> >   }
> --
> Michal Hocko
> SUSE Labs


Re: [PATCH] writeback: safer lock nesting

2018-04-06 Thread Greg Thelen
On Fri, Apr 6, 2018 at 1:07 AM Michal Hocko  wrote:

> On Fri 06-04-18 01:03:24, Greg Thelen wrote:
> [...]
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index d4d04fee568a..d51bae5a53e2 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -746,10 +746,11 @@ int inode_congested(struct inode *inode, int
cong_bits)
> >   if (inode && inode_to_wb_is_valid(inode)) {
> >   struct bdi_writeback *wb;
> >   bool locked, congested;
> > + unsigned long flags;
> >
> > - wb = unlocked_inode_to_wb_begin(inode, );
> > + wb = unlocked_inode_to_wb_begin(inode, , );

> Wouldn't it be better to have a cookie (struct) rather than 2 parameters
> and let unlocked_inode_to_wb_end DTRT?

Nod.  I'll post a V2 patch with that change.

> >   congested = wb_congested(wb, cong_bits);
> > - unlocked_inode_to_wb_end(inode, locked);
> > + unlocked_inode_to_wb_end(inode, locked, flags);
> >   return congested;
> >   }
> --
> Michal Hocko
> SUSE Labs


Re: [PATCH] writeback: safer lock nesting

2018-04-06 Thread Michal Hocko
On Fri 06-04-18 01:03:24, Greg Thelen wrote:
[...]
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index d4d04fee568a..d51bae5a53e2 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -746,10 +746,11 @@ int inode_congested(struct inode *inode, int cong_bits)
>   if (inode && inode_to_wb_is_valid(inode)) {
>   struct bdi_writeback *wb;
>   bool locked, congested;
> + unsigned long flags;
>  
> - wb = unlocked_inode_to_wb_begin(inode, );
> + wb = unlocked_inode_to_wb_begin(inode, , );

Wouldn't it be better to have a cookie (struct) rather than 2 parameters
and let unlocked_inode_to_wb_end DTRT?

>   congested = wb_congested(wb, cong_bits);
> - unlocked_inode_to_wb_end(inode, locked);
> + unlocked_inode_to_wb_end(inode, locked, flags);
>   return congested;
>   }
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] writeback: safer lock nesting

2018-04-06 Thread Michal Hocko
On Fri 06-04-18 01:03:24, Greg Thelen wrote:
[...]
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index d4d04fee568a..d51bae5a53e2 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -746,10 +746,11 @@ int inode_congested(struct inode *inode, int cong_bits)
>   if (inode && inode_to_wb_is_valid(inode)) {
>   struct bdi_writeback *wb;
>   bool locked, congested;
> + unsigned long flags;
>  
> - wb = unlocked_inode_to_wb_begin(inode, );
> + wb = unlocked_inode_to_wb_begin(inode, , );

Wouldn't it be better to have a cookie (struct) rather than 2 parameters
and let unlocked_inode_to_wb_end DTRT?

>   congested = wb_congested(wb, cong_bits);
> - unlocked_inode_to_wb_end(inode, locked);
> + unlocked_inode_to_wb_end(inode, locked, flags);
>   return congested;
>   }
-- 
Michal Hocko
SUSE Labs


[PATCH] writeback: safer lock nesting

2018-04-06 Thread Greg Thelen
lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
the page's memcg is undergoing move accounting, which occurs when a
process leaves its memcg for a new one that has
memory.move_charge_at_immigrate set.

unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if the
given inode is switching writeback domains.  Swithces occur when enough
writes are issued from a new domain.

This existing pattern is thus suspicious:
lock_page_memcg(page);
unlocked_inode_to_wb_begin(inode, );
...
unlocked_inode_to_wb_end(inode, locked);
unlock_page_memcg(page);

If both inode switch and process memcg migration are both in-flight then
unlocked_inode_to_wb_end() will unconditionally enable interrupts while
still holding the lock_page_memcg() irq spinlock.  This suggests the
possibility of deadlock if an interrupt occurs before
unlock_page_memcg().

truncate
__cancel_dirty_page
lock_page_memcg
unlocked_inode_to_wb_begin
unlocked_inode_to_wb_end


end_page_writeback
test_clear_page_writeback
lock_page_memcg

unlock_page_memcg

Due to configuration limitations this deadlock is not currently possible
because we don't mix cgroup writeback (a cgroupv2 feature) and
memory.move_charge_at_immigrate (a cgroupv1 feature).

If the kernel is hacked to always claim inode switching and memcg
moving_account, then this script triggers lockup in less than a minute:
  cd /mnt/cgroup/memory
  mkdir a b
  echo 1 > a/memory.move_charge_at_immigrate
  echo 1 > b/memory.move_charge_at_immigrate
  (
echo $BASHPID > a/cgroup.procs
while true; do
  dd if=/dev/zero of=/mnt/big bs=1M count=256
done
  ) &
  while true; do
sync
  done &
  sleep 1h &
  SLEEP=$!
  while true; do
echo $SLEEP > a/cgroup.procs
echo $SLEEP > b/cgroup.procs
  done

Given the deadlock is not currently possible, it's debatable if there's
any reason to modify the kernel.  I suggest we should to prevent future
surprises.

Reported-by: Wang Long 
Signed-off-by: Greg Thelen 
---
 fs/fs-writeback.c   |  5 +++--
 include/linux/backing-dev.h | 18 --
 mm/page-writeback.c | 15 +--
 3 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index d4d04fee568a..d51bae5a53e2 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -746,10 +746,11 @@ int inode_congested(struct inode *inode, int cong_bits)
if (inode && inode_to_wb_is_valid(inode)) {
struct bdi_writeback *wb;
bool locked, congested;
+   unsigned long flags;
 
-   wb = unlocked_inode_to_wb_begin(inode, );
+   wb = unlocked_inode_to_wb_begin(inode, , );
congested = wb_congested(wb, cong_bits);
-   unlocked_inode_to_wb_end(inode, locked);
+   unlocked_inode_to_wb_end(inode, locked, flags);
return congested;
}
 
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 3e4ce54d84ab..6c74b64d6f56 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -347,6 +347,7 @@ static inline struct bdi_writeback *inode_to_wb(const 
struct inode *inode)
  * unlocked_inode_to_wb_begin - begin unlocked inode wb access transaction
  * @inode: target inode
  * @lockedp: temp bool output param, to be passed to the end function
+ * @flags: saved irq flags, to be passed to the end function
  *
  * The caller wants to access the wb associated with @inode but isn't
  * holding inode->i_lock, mapping->tree_lock or wb->list_lock.  This
@@ -359,7 +360,8 @@ static inline struct bdi_writeback *inode_to_wb(const 
struct inode *inode)
  * disabled on return.
  */
 static inline struct bdi_writeback *
-unlocked_inode_to_wb_begin(struct inode *inode, bool *lockedp)
+unlocked_inode_to_wb_begin(struct inode *inode, bool *lockedp,
+  unsigned long *flags)
 {
rcu_read_lock();
 
@@ -370,7 +372,7 @@ unlocked_inode_to_wb_begin(struct inode *inode, bool 
*lockedp)
*lockedp = smp_load_acquire(>i_state) & I_WB_SWITCH;
 
if (unlikely(*lockedp))
-   spin_lock_irq(>i_mapping->tree_lock);
+   spin_lock_irqsave(>i_mapping->tree_lock, *flags);
 
/*
 * Protected by either !I_WB_SWITCH + rcu_read_lock() or tree_lock.
@@ -383,11 +385,13 @@ unlocked_inode_to_wb_begin(struct inode *inode, bool 
*lockedp)
  * unlocked_inode_to_wb_end - end inode wb access transaction
  * @inode: target inode
  * @locked: *@lockedp from unlocked_inode_to_wb_begin()
+ * @flags: *@flags from unlocked_inode_to_wb_begin()
  */
-static inline void unlocked_inode_to_wb_end(struct inode *inode, bool locked)
+static inline void 

[PATCH] writeback: safer lock nesting

2018-04-06 Thread Greg Thelen
lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
the page's memcg is undergoing move accounting, which occurs when a
process leaves its memcg for a new one that has
memory.move_charge_at_immigrate set.

unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if the
given inode is switching writeback domains.  Swithces occur when enough
writes are issued from a new domain.

This existing pattern is thus suspicious:
lock_page_memcg(page);
unlocked_inode_to_wb_begin(inode, );
...
unlocked_inode_to_wb_end(inode, locked);
unlock_page_memcg(page);

If both inode switch and process memcg migration are both in-flight then
unlocked_inode_to_wb_end() will unconditionally enable interrupts while
still holding the lock_page_memcg() irq spinlock.  This suggests the
possibility of deadlock if an interrupt occurs before
unlock_page_memcg().

truncate
__cancel_dirty_page
lock_page_memcg
unlocked_inode_to_wb_begin
unlocked_inode_to_wb_end


end_page_writeback
test_clear_page_writeback
lock_page_memcg

unlock_page_memcg

Due to configuration limitations this deadlock is not currently possible
because we don't mix cgroup writeback (a cgroupv2 feature) and
memory.move_charge_at_immigrate (a cgroupv1 feature).

If the kernel is hacked to always claim inode switching and memcg
moving_account, then this script triggers lockup in less than a minute:
  cd /mnt/cgroup/memory
  mkdir a b
  echo 1 > a/memory.move_charge_at_immigrate
  echo 1 > b/memory.move_charge_at_immigrate
  (
echo $BASHPID > a/cgroup.procs
while true; do
  dd if=/dev/zero of=/mnt/big bs=1M count=256
done
  ) &
  while true; do
sync
  done &
  sleep 1h &
  SLEEP=$!
  while true; do
echo $SLEEP > a/cgroup.procs
echo $SLEEP > b/cgroup.procs
  done

Given the deadlock is not currently possible, it's debatable if there's
any reason to modify the kernel.  I suggest we should to prevent future
surprises.

Reported-by: Wang Long 
Signed-off-by: Greg Thelen 
---
 fs/fs-writeback.c   |  5 +++--
 include/linux/backing-dev.h | 18 --
 mm/page-writeback.c | 15 +--
 3 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index d4d04fee568a..d51bae5a53e2 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -746,10 +746,11 @@ int inode_congested(struct inode *inode, int cong_bits)
if (inode && inode_to_wb_is_valid(inode)) {
struct bdi_writeback *wb;
bool locked, congested;
+   unsigned long flags;
 
-   wb = unlocked_inode_to_wb_begin(inode, );
+   wb = unlocked_inode_to_wb_begin(inode, , );
congested = wb_congested(wb, cong_bits);
-   unlocked_inode_to_wb_end(inode, locked);
+   unlocked_inode_to_wb_end(inode, locked, flags);
return congested;
}
 
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 3e4ce54d84ab..6c74b64d6f56 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -347,6 +347,7 @@ static inline struct bdi_writeback *inode_to_wb(const 
struct inode *inode)
  * unlocked_inode_to_wb_begin - begin unlocked inode wb access transaction
  * @inode: target inode
  * @lockedp: temp bool output param, to be passed to the end function
+ * @flags: saved irq flags, to be passed to the end function
  *
  * The caller wants to access the wb associated with @inode but isn't
  * holding inode->i_lock, mapping->tree_lock or wb->list_lock.  This
@@ -359,7 +360,8 @@ static inline struct bdi_writeback *inode_to_wb(const 
struct inode *inode)
  * disabled on return.
  */
 static inline struct bdi_writeback *
-unlocked_inode_to_wb_begin(struct inode *inode, bool *lockedp)
+unlocked_inode_to_wb_begin(struct inode *inode, bool *lockedp,
+  unsigned long *flags)
 {
rcu_read_lock();
 
@@ -370,7 +372,7 @@ unlocked_inode_to_wb_begin(struct inode *inode, bool 
*lockedp)
*lockedp = smp_load_acquire(>i_state) & I_WB_SWITCH;
 
if (unlikely(*lockedp))
-   spin_lock_irq(>i_mapping->tree_lock);
+   spin_lock_irqsave(>i_mapping->tree_lock, *flags);
 
/*
 * Protected by either !I_WB_SWITCH + rcu_read_lock() or tree_lock.
@@ -383,11 +385,13 @@ unlocked_inode_to_wb_begin(struct inode *inode, bool 
*lockedp)
  * unlocked_inode_to_wb_end - end inode wb access transaction
  * @inode: target inode
  * @locked: *@lockedp from unlocked_inode_to_wb_begin()
+ * @flags: *@flags from unlocked_inode_to_wb_begin()
  */
-static inline void unlocked_inode_to_wb_end(struct inode *inode, bool locked)
+static inline void unlocked_inode_to_wb_end(struct inode *inode,