Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-13 Thread Oleg Nesterov
Dave, I am sorry for delay.

On 10/10, Dave Chinner wrote:
>
> So, it's time to waste more time explaining why lockdep is telling
> us about something that *isn't a bug*.
> [... snip ...]

OK, thanks. I am not surprised although I have to admit I wasn't sure.

> Basically, what we are seeing here is yet another case of "lockdep
> is just smart enough to be really dumb" because we cannot fully
> express or cleanly annotate the contexts in which it is being asked
> to validate.

Yes... perhaps we can add the new lockdep helpers to avoid the false-
positives like this one, but so far it is not clear to me what we can do.
Somehow we need to tell it to to avoid check_prev_add() because we know
that the work function won't take sb_internal, but at the same time we
should complain if it actually does this. Lets ignore this patch for now.

Oleg.



Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-13 Thread Oleg Nesterov
Dave, I am sorry for delay.

On 10/10, Dave Chinner wrote:
>
> So, it's time to waste more time explaining why lockdep is telling
> us about something that *isn't a bug*.
> [... snip ...]

OK, thanks. I am not surprised although I have to admit I wasn't sure.

> Basically, what we are seeing here is yet another case of "lockdep
> is just smart enough to be really dumb" because we cannot fully
> express or cleanly annotate the contexts in which it is being asked
> to validate.

Yes... perhaps we can add the new lockdep helpers to avoid the false-
positives like this one, but so far it is not clear to me what we can do.
Somehow we need to tell it to to avoid check_prev_add() because we know
that the work function won't take sb_internal, but at the same time we
should complain if it actually does this. Lets ignore this patch for now.

Oleg.



Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-09 Thread Dave Chinner
On Sun, Oct 09, 2016 at 06:14:57PM +0200, Oleg Nesterov wrote:
> On 10/08, Dave Chinner wrote:
> >
> > On Fri, Oct 07, 2016 at 07:15:18PM +0200, Oleg Nesterov wrote:
> > > > >
> > > > > --- x/fs/xfs/xfs_trans.c
> > > > > +++ x/fs/xfs/xfs_trans.c
> > > > > @@ -245,7 +245,8 @@ xfs_trans_alloc(
> > > > >   atomic_inc(>m_active_trans);
> > > > >
> > > > >   tp = kmem_zone_zalloc(xfs_trans_zone,
> > > > > - (flags & XFS_TRANS_NOFS) ? KM_NOFS : KM_SLEEP);
> > > > > + (flags & (XFS_TRANS_NOFS | XFS_TRANS_NO_WRITECOUNT))
> > > > > + ? KM_NOFS : KM_SLEEP);
> > > > >   tp->t_magic = XFS_TRANS_HEADER_MAGIC;
> > > > >   tp->t_flags = flags;
> > > > >   tp->t_mountp = mp;
> > > >
> > > > Brief examination says caller should set XFS_TRANS_NOFS, not change
> > > > the implementation to make XFS_TRANS_NO_WRITECOUNT flag to also mean
> > > > XFS_TRANS_NOFS.
> > >
> > > I didn't mean the change above can fix the problem, and I don't really
> > > understand your suggestion.
> >
> > xfs_syncsb() does:
> >
> > tp = xfs_trans_alloc(... , XFS_TRANS_NO_WRITECOUNT, );
> >
> > but it's running in a GFP_NOFS context when a freeze is being
> > finalised. SO, rather than changing what XFS_TRANS_NO_WRITECOUNT
> > does in xfs_trans_alloc(), we should tell it to do a GFP_NOFS
> > allocation. i.e.
> >
> > tp = xfs_trans_alloc(... , XFS_TRANS_NOFS | XFS_TRANS_NO_WRITECOUNT, 
> > );
> 
> Ah. This is clear but I am not sure it is enough,
> 
> > > Obviously any GFP_FS allocation in xfs_fs_freeze()
> > > paths will trigger the same warning.
> >
> > Of which there should be none except for that xfs_trans_alloc()
> > call.
> 
> Really?

/Should/ is an indication of *intent*. Reality is that there may be
*bugs*. We all know that testing can't prove the absence of bugs, so
even after years and years of exercising the code with producing any
evidence there may still be problems.

So, it's time to waste more time explaining why lockdep is telling
us about something that *isn't a bug*.

> Again, I can be easily wrong, but when I look at xfs_freeze_fs()
> paths I can see
> 
> xfs_fs_freeze()->xfs_quiesce_attr()->xfs_log_quiesce()->xfs_log_unmount_write()
> ->xfs_log_reserve()->xlog_ticket_alloc(KM_SLEEP)

So, the problem being indicated here is that memory reclaim might
either try to a) write back dirty pages (which require allocation
transactions), b) might run a shrinker that requires running a
transaction or c) we might run periodic background inode reclaim.

For the case of a), this /can't happen/ because we've already run
the part of a freeze that stops pages being dirtied and then written
them all back and made them clean. So we won't run transactions from
page cache reclaim and so we can't deadlock.

For the case of b), well, that's even easier - the only shrinker
path we care about here is inode reclaim through super_cache_scan().
Before that shrinker runs anything it calls:

if (!trylock_super(sb))
return SHRINK_STOP;

Now, we're running memory allocation for the freeze context, which
means we are holding the sb->s_umount semaphore in write mode. That
means the shrinker is going to /fail to lock the superblock/ and
therefore not run any reclaim on that superblock.

IOWs, while we hold the s_umount lock in write mode across a memory
allocation, the shrinkers run in GFP_NOFS mode automatically. So we
can't run transactions from memory reclaim and so we will can't
deadlock.

For the case of c), xfs_quiesce_attr() does this:

/* force the log to unpin objects from the now complete transactions */
xfs_log_force(mp, XFS_LOG_SYNC);

/* reclaim inodes to do any IO before the freeze completes */
xfs_reclaim_inodes(mp, 0);
xfs_reclaim_inodes(mp, SYNC_WAIT);

We basically unpin, clean, and reclaim all the unused inodes the XFS
inode cache. With the shrinker not reclaiming any inodes, and there
being no cached, dirty, unreclaimed inodes remaining in the cache,
there can be no background memory allocations, transactions or IO
triggered memory reclaim in this filesystem. At this point, memory
reclaim /should never block/ trying to reclaim objects from this
filesystem that require transactions to free.

>From this, it should be obvious that we don't even need to
change the code in xfs_syncsb() - in the freeze context that the
allocation is run we've got a clean filesystem where memory reclaim
won't block on the filesystem being frozen, so the code is safe as
it stands.

> > > just for testing purposes and after that I got another warning below. I 
> > > didn't
> > > read it carefully yet, but _at first glance_ it looks like the lock 
> > > inversion
> > > uncovered by 2/2, although I can be easily wrong. 
> > > cancel_delayed_work_sync(l_work)
> > > under sb_internal can hang if xfs_log_worker() waits for this rwsem?`
> >
> > Actually: I *can't read it*. I've got no fucking clue what lockdep
> > is trying to say 

Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-09 Thread Dave Chinner
On Sun, Oct 09, 2016 at 06:14:57PM +0200, Oleg Nesterov wrote:
> On 10/08, Dave Chinner wrote:
> >
> > On Fri, Oct 07, 2016 at 07:15:18PM +0200, Oleg Nesterov wrote:
> > > > >
> > > > > --- x/fs/xfs/xfs_trans.c
> > > > > +++ x/fs/xfs/xfs_trans.c
> > > > > @@ -245,7 +245,8 @@ xfs_trans_alloc(
> > > > >   atomic_inc(>m_active_trans);
> > > > >
> > > > >   tp = kmem_zone_zalloc(xfs_trans_zone,
> > > > > - (flags & XFS_TRANS_NOFS) ? KM_NOFS : KM_SLEEP);
> > > > > + (flags & (XFS_TRANS_NOFS | XFS_TRANS_NO_WRITECOUNT))
> > > > > + ? KM_NOFS : KM_SLEEP);
> > > > >   tp->t_magic = XFS_TRANS_HEADER_MAGIC;
> > > > >   tp->t_flags = flags;
> > > > >   tp->t_mountp = mp;
> > > >
> > > > Brief examination says caller should set XFS_TRANS_NOFS, not change
> > > > the implementation to make XFS_TRANS_NO_WRITECOUNT flag to also mean
> > > > XFS_TRANS_NOFS.
> > >
> > > I didn't mean the change above can fix the problem, and I don't really
> > > understand your suggestion.
> >
> > xfs_syncsb() does:
> >
> > tp = xfs_trans_alloc(... , XFS_TRANS_NO_WRITECOUNT, );
> >
> > but it's running in a GFP_NOFS context when a freeze is being
> > finalised. SO, rather than changing what XFS_TRANS_NO_WRITECOUNT
> > does in xfs_trans_alloc(), we should tell it to do a GFP_NOFS
> > allocation. i.e.
> >
> > tp = xfs_trans_alloc(... , XFS_TRANS_NOFS | XFS_TRANS_NO_WRITECOUNT, 
> > );
> 
> Ah. This is clear but I am not sure it is enough,
> 
> > > Obviously any GFP_FS allocation in xfs_fs_freeze()
> > > paths will trigger the same warning.
> >
> > Of which there should be none except for that xfs_trans_alloc()
> > call.
> 
> Really?

/Should/ is an indication of *intent*. Reality is that there may be
*bugs*. We all know that testing can't prove the absence of bugs, so
even after years and years of exercising the code with producing any
evidence there may still be problems.

So, it's time to waste more time explaining why lockdep is telling
us about something that *isn't a bug*.

> Again, I can be easily wrong, but when I look at xfs_freeze_fs()
> paths I can see
> 
> xfs_fs_freeze()->xfs_quiesce_attr()->xfs_log_quiesce()->xfs_log_unmount_write()
> ->xfs_log_reserve()->xlog_ticket_alloc(KM_SLEEP)

So, the problem being indicated here is that memory reclaim might
either try to a) write back dirty pages (which require allocation
transactions), b) might run a shrinker that requires running a
transaction or c) we might run periodic background inode reclaim.

For the case of a), this /can't happen/ because we've already run
the part of a freeze that stops pages being dirtied and then written
them all back and made them clean. So we won't run transactions from
page cache reclaim and so we can't deadlock.

For the case of b), well, that's even easier - the only shrinker
path we care about here is inode reclaim through super_cache_scan().
Before that shrinker runs anything it calls:

if (!trylock_super(sb))
return SHRINK_STOP;

Now, we're running memory allocation for the freeze context, which
means we are holding the sb->s_umount semaphore in write mode. That
means the shrinker is going to /fail to lock the superblock/ and
therefore not run any reclaim on that superblock.

IOWs, while we hold the s_umount lock in write mode across a memory
allocation, the shrinkers run in GFP_NOFS mode automatically. So we
can't run transactions from memory reclaim and so we will can't
deadlock.

For the case of c), xfs_quiesce_attr() does this:

/* force the log to unpin objects from the now complete transactions */
xfs_log_force(mp, XFS_LOG_SYNC);

/* reclaim inodes to do any IO before the freeze completes */
xfs_reclaim_inodes(mp, 0);
xfs_reclaim_inodes(mp, SYNC_WAIT);

We basically unpin, clean, and reclaim all the unused inodes the XFS
inode cache. With the shrinker not reclaiming any inodes, and there
being no cached, dirty, unreclaimed inodes remaining in the cache,
there can be no background memory allocations, transactions or IO
triggered memory reclaim in this filesystem. At this point, memory
reclaim /should never block/ trying to reclaim objects from this
filesystem that require transactions to free.

>From this, it should be obvious that we don't even need to
change the code in xfs_syncsb() - in the freeze context that the
allocation is run we've got a clean filesystem where memory reclaim
won't block on the filesystem being frozen, so the code is safe as
it stands.

> > > just for testing purposes and after that I got another warning below. I 
> > > didn't
> > > read it carefully yet, but _at first glance_ it looks like the lock 
> > > inversion
> > > uncovered by 2/2, although I can be easily wrong. 
> > > cancel_delayed_work_sync(l_work)
> > > under sb_internal can hang if xfs_log_worker() waits for this rwsem?`
> >
> > Actually: I *can't read it*. I've got no fucking clue what lockdep
> > is trying to say 

Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-09 Thread Oleg Nesterov
On 10/08, Dave Chinner wrote:
>
> On Fri, Oct 07, 2016 at 07:15:18PM +0200, Oleg Nesterov wrote:
> > > >
> > > > --- x/fs/xfs/xfs_trans.c
> > > > +++ x/fs/xfs/xfs_trans.c
> > > > @@ -245,7 +245,8 @@ xfs_trans_alloc(
> > > > atomic_inc(>m_active_trans);
> > > >
> > > > tp = kmem_zone_zalloc(xfs_trans_zone,
> > > > -   (flags & XFS_TRANS_NOFS) ? KM_NOFS : KM_SLEEP);
> > > > +   (flags & (XFS_TRANS_NOFS | XFS_TRANS_NO_WRITECOUNT))
> > > > +   ? KM_NOFS : KM_SLEEP);
> > > > tp->t_magic = XFS_TRANS_HEADER_MAGIC;
> > > > tp->t_flags = flags;
> > > > tp->t_mountp = mp;
> > >
> > > Brief examination says caller should set XFS_TRANS_NOFS, not change
> > > the implementation to make XFS_TRANS_NO_WRITECOUNT flag to also mean
> > > XFS_TRANS_NOFS.
> >
> > I didn't mean the change above can fix the problem, and I don't really
> > understand your suggestion.
>
> xfs_syncsb() does:
>
>   tp = xfs_trans_alloc(... , XFS_TRANS_NO_WRITECOUNT, );
>
> but it's running in a GFP_NOFS context when a freeze is being
> finalised. SO, rather than changing what XFS_TRANS_NO_WRITECOUNT
> does in xfs_trans_alloc(), we should tell it to do a GFP_NOFS
> allocation. i.e.
>
>   tp = xfs_trans_alloc(... , XFS_TRANS_NOFS | XFS_TRANS_NO_WRITECOUNT, 
> );

Ah. This is clear but I am not sure it is enough,

> > Obviously any GFP_FS allocation in xfs_fs_freeze()
> > paths will trigger the same warning.
>
> Of which there should be none except for that xfs_trans_alloc()
> call.

Really? Again, I can be easily wrong, but when I look at xfs_freeze_fs()
paths I can see

xfs_fs_freeze()->xfs_quiesce_attr()->xfs_log_quiesce()->xfs_log_unmount_write()
->xfs_log_reserve()->xlog_ticket_alloc(KM_SLEEP)

at least. But I can test the change above, perhaps this call chain is
not possible...

> > I added this hack
> >
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -1333,10 +1333,15 @@ xfs_fs_freeze(
> > struct super_block  *sb)
> >  {
> > struct xfs_mount*mp = XFS_M(sb);
> > +   int ret;
> >
> > +   current->flags |= PF_FSTRANS; // tell kmem_flags_convert() to 
> > remove GFP_FS
> > xfs_save_resvblks(mp);
> > xfs_quiesce_attr(mp);
> > -   return xfs_sync_sb(mp, true);
> > +   ret = xfs_sync_sb(mp, true);
> > +   current->flags &= ~PF_FSTRANS;
> > +
> > +   return ret;
> >  }
>
> /me shudders

don't worry, this debugging change won't escape my testing machine!

> > just for testing purposes and after that I got another warning below. I 
> > didn't
> > read it carefully yet, but _at first glance_ it looks like the lock 
> > inversion
> > uncovered by 2/2, although I can be easily wrong. 
> > cancel_delayed_work_sync(l_work)
> > under sb_internal can hang if xfs_log_worker() waits for this rwsem?`
>
> Actually: I *can't read it*. I've got no fucking clue what lockdep
> is trying to say here. This /looks/ like a lockdep is getting
> confused

I can almost never understand what lockdep tells me, it is too clever for me.
But this time I think it is right.

Suppose that freeze_super() races with xfs_log_worker() callback.

freeze_super() takes sb_internal lock and xfs_log_quiesce() calls
cancel_delayed_work_sync(l_work). This will sleep until xfs_log_worker()
finishes.

xfs_log_worker() does a __GFP_FS alloc, triggers reclaim, and blocks
on the same sb_internal lock. Say, in xfs_do_writepage()->xfs_trans_alloc()
path.

Deadlock. The worker thread can't take sb_internal hold by freeze_super(),
cancel_delayed_work_sync() will sleep forever because xfs_log_worker()
can't finish.

So xfs_log_worker() should run in a GFP_NOFS context too. And perhaps
the change above in xfs_trans_alloc() or in xfs_sync_sb() can help if
it doesn't do other allocatiions, I dunno.

Oleg.



Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-09 Thread Oleg Nesterov
On 10/08, Dave Chinner wrote:
>
> On Fri, Oct 07, 2016 at 07:15:18PM +0200, Oleg Nesterov wrote:
> > > >
> > > > --- x/fs/xfs/xfs_trans.c
> > > > +++ x/fs/xfs/xfs_trans.c
> > > > @@ -245,7 +245,8 @@ xfs_trans_alloc(
> > > > atomic_inc(>m_active_trans);
> > > >
> > > > tp = kmem_zone_zalloc(xfs_trans_zone,
> > > > -   (flags & XFS_TRANS_NOFS) ? KM_NOFS : KM_SLEEP);
> > > > +   (flags & (XFS_TRANS_NOFS | XFS_TRANS_NO_WRITECOUNT))
> > > > +   ? KM_NOFS : KM_SLEEP);
> > > > tp->t_magic = XFS_TRANS_HEADER_MAGIC;
> > > > tp->t_flags = flags;
> > > > tp->t_mountp = mp;
> > >
> > > Brief examination says caller should set XFS_TRANS_NOFS, not change
> > > the implementation to make XFS_TRANS_NO_WRITECOUNT flag to also mean
> > > XFS_TRANS_NOFS.
> >
> > I didn't mean the change above can fix the problem, and I don't really
> > understand your suggestion.
>
> xfs_syncsb() does:
>
>   tp = xfs_trans_alloc(... , XFS_TRANS_NO_WRITECOUNT, );
>
> but it's running in a GFP_NOFS context when a freeze is being
> finalised. SO, rather than changing what XFS_TRANS_NO_WRITECOUNT
> does in xfs_trans_alloc(), we should tell it to do a GFP_NOFS
> allocation. i.e.
>
>   tp = xfs_trans_alloc(... , XFS_TRANS_NOFS | XFS_TRANS_NO_WRITECOUNT, 
> );

Ah. This is clear but I am not sure it is enough,

> > Obviously any GFP_FS allocation in xfs_fs_freeze()
> > paths will trigger the same warning.
>
> Of which there should be none except for that xfs_trans_alloc()
> call.

Really? Again, I can be easily wrong, but when I look at xfs_freeze_fs()
paths I can see

xfs_fs_freeze()->xfs_quiesce_attr()->xfs_log_quiesce()->xfs_log_unmount_write()
->xfs_log_reserve()->xlog_ticket_alloc(KM_SLEEP)

at least. But I can test the change above, perhaps this call chain is
not possible...

> > I added this hack
> >
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -1333,10 +1333,15 @@ xfs_fs_freeze(
> > struct super_block  *sb)
> >  {
> > struct xfs_mount*mp = XFS_M(sb);
> > +   int ret;
> >
> > +   current->flags |= PF_FSTRANS; // tell kmem_flags_convert() to 
> > remove GFP_FS
> > xfs_save_resvblks(mp);
> > xfs_quiesce_attr(mp);
> > -   return xfs_sync_sb(mp, true);
> > +   ret = xfs_sync_sb(mp, true);
> > +   current->flags &= ~PF_FSTRANS;
> > +
> > +   return ret;
> >  }
>
> /me shudders

don't worry, this debugging change won't escape my testing machine!

> > just for testing purposes and after that I got another warning below. I 
> > didn't
> > read it carefully yet, but _at first glance_ it looks like the lock 
> > inversion
> > uncovered by 2/2, although I can be easily wrong. 
> > cancel_delayed_work_sync(l_work)
> > under sb_internal can hang if xfs_log_worker() waits for this rwsem?`
>
> Actually: I *can't read it*. I've got no fucking clue what lockdep
> is trying to say here. This /looks/ like a lockdep is getting
> confused

I can almost never understand what lockdep tells me, it is too clever for me.
But this time I think it is right.

Suppose that freeze_super() races with xfs_log_worker() callback.

freeze_super() takes sb_internal lock and xfs_log_quiesce() calls
cancel_delayed_work_sync(l_work). This will sleep until xfs_log_worker()
finishes.

xfs_log_worker() does a __GFP_FS alloc, triggers reclaim, and blocks
on the same sb_internal lock. Say, in xfs_do_writepage()->xfs_trans_alloc()
path.

Deadlock. The worker thread can't take sb_internal hold by freeze_super(),
cancel_delayed_work_sync() will sleep forever because xfs_log_worker()
can't finish.

So xfs_log_worker() should run in a GFP_NOFS context too. And perhaps
the change above in xfs_trans_alloc() or in xfs_sync_sb() can help if
it doesn't do other allocatiions, I dunno.

Oleg.



Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-07 Thread Dave Chinner
On Fri, Oct 07, 2016 at 07:15:18PM +0200, Oleg Nesterov wrote:
> On 10/07, Dave Chinner wrote:
> >
> > On Thu, Oct 06, 2016 at 07:17:58PM +0200, Oleg Nesterov wrote:
> > > Probably false positive? Although when I look at the comment above 
> > > xfs_sync_sb()
> > > I think that may be sometging like below makes sense, but I know 
> > > absolutely nothing
> > > about fs/ and XFS in particular.
> > >
> > > Oleg.
> > >
> > >
> > > --- x/fs/xfs/xfs_trans.c
> > > +++ x/fs/xfs/xfs_trans.c
> > > @@ -245,7 +245,8 @@ xfs_trans_alloc(
> > >   atomic_inc(>m_active_trans);
> > >
> > >   tp = kmem_zone_zalloc(xfs_trans_zone,
> > > - (flags & XFS_TRANS_NOFS) ? KM_NOFS : KM_SLEEP);
> > > + (flags & (XFS_TRANS_NOFS | XFS_TRANS_NO_WRITECOUNT))
> > > + ? KM_NOFS : KM_SLEEP);
> > >   tp->t_magic = XFS_TRANS_HEADER_MAGIC;
> > >   tp->t_flags = flags;
> > >   tp->t_mountp = mp;
> >
> > Brief examination says caller should set XFS_TRANS_NOFS, not change
> > the implementation to make XFS_TRANS_NO_WRITECOUNT flag to also mean
> > XFS_TRANS_NOFS.
> 
> I didn't mean the change above can fix the problem, and I don't really
> understand your suggestion.

xfs_syncsb() does:

tp = xfs_trans_alloc(... , XFS_TRANS_NO_WRITECOUNT, );

but it's running in a GFP_NOFS context when a freeze is being
finalised. SO, rather than changing what XFS_TRANS_NO_WRITECOUNT
does in xfs_trans_alloc(), we should tell it to do a GFP_NOFS
allocation. i.e.

tp = xfs_trans_alloc(... , XFS_TRANS_NOFS | XFS_TRANS_NO_WRITECOUNT, 
);

and nothing inside xfs_trans_alloc() changes at all.

> Obviously any GFP_FS allocation in xfs_fs_freeze()
> paths will trigger the same warning.

Of which there should be none except for that xfs_trans_alloc()
call.

> I added this hack
> 
>   --- a/fs/xfs/xfs_super.c
>   +++ b/fs/xfs/xfs_super.c
>   @@ -1333,10 +1333,15 @@ xfs_fs_freeze(
>   struct super_block  *sb)
>{
>   struct xfs_mount*mp = XFS_M(sb);
>   +   int ret;
>
>   +   current->flags |= PF_FSTRANS; // tell kmem_flags_convert() to 
> remove GFP_FS
>   xfs_save_resvblks(mp);
>   xfs_quiesce_attr(mp);
>   -   return xfs_sync_sb(mp, true);
>   +   ret = xfs_sync_sb(mp, true);
>   +   current->flags &= ~PF_FSTRANS;
>   +
>   +   return ret;
>}

/me shudders

> just for testing purposes and after that I got another warning below. I didn't
> read it carefully yet, but _at first glance_ it looks like the lock inversion
> uncovered by 2/2, although I can be easily wrong. 
> cancel_delayed_work_sync(l_work)
> under sb_internal can hang if xfs_log_worker() waits for this rwsem?`

Actually: I *can't read it*. I've got no fucking clue what lockdep
is trying to say here. This /looks/ like a lockdep is getting
confused between memory reclaim contexts (which /aren't locks/ but
overload interrupt levels) and freeze contexts (which /aren't locks/)
and workqueue locks which /aren't nested/ inside transactions or
freeze contexts. But, really, I can't follow it because I have to
guess at what "lock contexts" are not locks but are something else.

cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-07 Thread Dave Chinner
On Fri, Oct 07, 2016 at 07:15:18PM +0200, Oleg Nesterov wrote:
> On 10/07, Dave Chinner wrote:
> >
> > On Thu, Oct 06, 2016 at 07:17:58PM +0200, Oleg Nesterov wrote:
> > > Probably false positive? Although when I look at the comment above 
> > > xfs_sync_sb()
> > > I think that may be sometging like below makes sense, but I know 
> > > absolutely nothing
> > > about fs/ and XFS in particular.
> > >
> > > Oleg.
> > >
> > >
> > > --- x/fs/xfs/xfs_trans.c
> > > +++ x/fs/xfs/xfs_trans.c
> > > @@ -245,7 +245,8 @@ xfs_trans_alloc(
> > >   atomic_inc(>m_active_trans);
> > >
> > >   tp = kmem_zone_zalloc(xfs_trans_zone,
> > > - (flags & XFS_TRANS_NOFS) ? KM_NOFS : KM_SLEEP);
> > > + (flags & (XFS_TRANS_NOFS | XFS_TRANS_NO_WRITECOUNT))
> > > + ? KM_NOFS : KM_SLEEP);
> > >   tp->t_magic = XFS_TRANS_HEADER_MAGIC;
> > >   tp->t_flags = flags;
> > >   tp->t_mountp = mp;
> >
> > Brief examination says caller should set XFS_TRANS_NOFS, not change
> > the implementation to make XFS_TRANS_NO_WRITECOUNT flag to also mean
> > XFS_TRANS_NOFS.
> 
> I didn't mean the change above can fix the problem, and I don't really
> understand your suggestion.

xfs_syncsb() does:

tp = xfs_trans_alloc(... , XFS_TRANS_NO_WRITECOUNT, );

but it's running in a GFP_NOFS context when a freeze is being
finalised. SO, rather than changing what XFS_TRANS_NO_WRITECOUNT
does in xfs_trans_alloc(), we should tell it to do a GFP_NOFS
allocation. i.e.

tp = xfs_trans_alloc(... , XFS_TRANS_NOFS | XFS_TRANS_NO_WRITECOUNT, 
);

and nothing inside xfs_trans_alloc() changes at all.

> Obviously any GFP_FS allocation in xfs_fs_freeze()
> paths will trigger the same warning.

Of which there should be none except for that xfs_trans_alloc()
call.

> I added this hack
> 
>   --- a/fs/xfs/xfs_super.c
>   +++ b/fs/xfs/xfs_super.c
>   @@ -1333,10 +1333,15 @@ xfs_fs_freeze(
>   struct super_block  *sb)
>{
>   struct xfs_mount*mp = XFS_M(sb);
>   +   int ret;
>
>   +   current->flags |= PF_FSTRANS; // tell kmem_flags_convert() to 
> remove GFP_FS
>   xfs_save_resvblks(mp);
>   xfs_quiesce_attr(mp);
>   -   return xfs_sync_sb(mp, true);
>   +   ret = xfs_sync_sb(mp, true);
>   +   current->flags &= ~PF_FSTRANS;
>   +
>   +   return ret;
>}

/me shudders

> just for testing purposes and after that I got another warning below. I didn't
> read it carefully yet, but _at first glance_ it looks like the lock inversion
> uncovered by 2/2, although I can be easily wrong. 
> cancel_delayed_work_sync(l_work)
> under sb_internal can hang if xfs_log_worker() waits for this rwsem?`

Actually: I *can't read it*. I've got no fucking clue what lockdep
is trying to say here. This /looks/ like a lockdep is getting
confused between memory reclaim contexts (which /aren't locks/ but
overload interrupt levels) and freeze contexts (which /aren't locks/)
and workqueue locks which /aren't nested/ inside transactions or
freeze contexts. But, really, I can't follow it because I have to
guess at what "lock contexts" are not locks but are something else.

cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-07 Thread Oleg Nesterov
On 10/07, Dave Chinner wrote:
>
> On Thu, Oct 06, 2016 at 07:17:58PM +0200, Oleg Nesterov wrote:
> > Probably false positive? Although when I look at the comment above 
> > xfs_sync_sb()
> > I think that may be sometging like below makes sense, but I know absolutely 
> > nothing
> > about fs/ and XFS in particular.
> >
> > Oleg.
> >
> >
> > --- x/fs/xfs/xfs_trans.c
> > +++ x/fs/xfs/xfs_trans.c
> > @@ -245,7 +245,8 @@ xfs_trans_alloc(
> > atomic_inc(>m_active_trans);
> >
> > tp = kmem_zone_zalloc(xfs_trans_zone,
> > -   (flags & XFS_TRANS_NOFS) ? KM_NOFS : KM_SLEEP);
> > +   (flags & (XFS_TRANS_NOFS | XFS_TRANS_NO_WRITECOUNT))
> > +   ? KM_NOFS : KM_SLEEP);
> > tp->t_magic = XFS_TRANS_HEADER_MAGIC;
> > tp->t_flags = flags;
> > tp->t_mountp = mp;
>
> Brief examination says caller should set XFS_TRANS_NOFS, not change
> the implementation to make XFS_TRANS_NO_WRITECOUNT flag to also mean
> XFS_TRANS_NOFS.

I didn't mean the change above can fix the problem, and I don't really
understand your suggestion. Obviously any GFP_FS allocation in xfs_fs_freeze()
paths will trigger the same warning.

I added this hack

--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1333,10 +1333,15 @@ xfs_fs_freeze(
struct super_block  *sb)
 {
struct xfs_mount*mp = XFS_M(sb);
+   int ret;
 
+   current->flags |= PF_FSTRANS; // tell kmem_flags_convert() to 
remove GFP_FS
xfs_save_resvblks(mp);
xfs_quiesce_attr(mp);
-   return xfs_sync_sb(mp, true);
+   ret = xfs_sync_sb(mp, true);
+   current->flags &= ~PF_FSTRANS;
+
+   return ret;
 }

just for testing purposes and after that I got another warning below. I didn't
read it carefully yet, but _at first glance_ it looks like the lock inversion
uncovered by 2/2, although I can be easily wrong. 
cancel_delayed_work_sync(l_work)
under sb_internal can hang if xfs_log_worker() waits for this rwsem?`

Oleg.

=
[ INFO: possible irq lock inversion dependency detected ]
4.8.0+ #10 Tainted: GW  
-
kswapd0/32 just changed the state of lock:
 (sb_internal){.?}, at: [] __sb_start_write+0xb7/0xf0
but this lock took another, RECLAIM_FS-unsafe lock in the past:
 ((&(>l_work)->work)){+.+.+.}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
 Possible interrupt unsafe locking scenario:
   CPU0CPU1
   
  lock((&(>l_work)->work));
   local_irq_disable();
   lock(sb_internal);
   lock((&(>l_work)->work));
  
lock(sb_internal);

 *** DEADLOCK ***
no locks held by kswapd0/32.

the shortest dependencies between 2nd lock and 1st lock:
 -> ((&(>l_work)->work)){+.+.+.} ops: 803 {
HARDIRQ-ON-W at:
  [] __lock_acquire+0x611/0x1870
  [] lock_acquire+0x10d/0x200
  [] process_one_work+0x1c9/0x690
  [] worker_thread+0x4e/0x480
  [] kthread+0xf3/0x110
  [] ret_from_fork+0x1f/0x40
SOFTIRQ-ON-W at:
  [] __lock_acquire+0x643/0x1870
  [] lock_acquire+0x10d/0x200
  [] process_one_work+0x1c9/0x690
  [] worker_thread+0x4e/0x480
  [] kthread+0xf3/0x110
  [] ret_from_fork+0x1f/0x40
RECLAIM_FS-ON-W at:
 [] mark_held_locks+0x6f/0xa0
 [] lockdep_trace_alloc+0xd3/0x120
 [] kmem_cache_alloc+0x2f/0x280
 [] kmem_zone_alloc+0x81/0x120
 [] xfs_trans_alloc+0x6c/0x130
 [] xfs_sync_sb+0x39/0x80
 [] xfs_log_worker+0xd4/0xf0
 [] process_one_work+0x1f0/0x690
 [] worker_thread+0x4e/0x480
 [] kthread+0xf3/0x110
 [] ret_from_fork+0x1f/0x40
INITIAL USE at:
 [] __lock_acquire+0x24b/0x1870
 [] lock_acquire+0x10d/0x200
 [] process_one_work+0x1c9/0x690
 [] worker_thread+0x4e/0x480
 [] kthread+0xf3/0x110
 [] ret_from_fork+0x1f/0x40
  }
  ... key  at: [] __key.130101+0x0/0x8
  ... acquired at:
   [] lock_acquire+0x10d/0x200
   [] flush_work+0x4c/0x2c0
   [] __cancel_work_timer+0x131/0x1f0
   [] cancel_delayed_work_sync+0x13/0x20
   [] xfs_log_quiesce+0x34/0x3b0
   [] xfs_quiesce_attr+0x6d/0xb0
   [] xfs_fs_freeze+0x33/0x50
   [] 

Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-07 Thread Oleg Nesterov
On 10/07, Dave Chinner wrote:
>
> On Thu, Oct 06, 2016 at 07:17:58PM +0200, Oleg Nesterov wrote:
> > Probably false positive? Although when I look at the comment above 
> > xfs_sync_sb()
> > I think that may be sometging like below makes sense, but I know absolutely 
> > nothing
> > about fs/ and XFS in particular.
> >
> > Oleg.
> >
> >
> > --- x/fs/xfs/xfs_trans.c
> > +++ x/fs/xfs/xfs_trans.c
> > @@ -245,7 +245,8 @@ xfs_trans_alloc(
> > atomic_inc(>m_active_trans);
> >
> > tp = kmem_zone_zalloc(xfs_trans_zone,
> > -   (flags & XFS_TRANS_NOFS) ? KM_NOFS : KM_SLEEP);
> > +   (flags & (XFS_TRANS_NOFS | XFS_TRANS_NO_WRITECOUNT))
> > +   ? KM_NOFS : KM_SLEEP);
> > tp->t_magic = XFS_TRANS_HEADER_MAGIC;
> > tp->t_flags = flags;
> > tp->t_mountp = mp;
>
> Brief examination says caller should set XFS_TRANS_NOFS, not change
> the implementation to make XFS_TRANS_NO_WRITECOUNT flag to also mean
> XFS_TRANS_NOFS.

I didn't mean the change above can fix the problem, and I don't really
understand your suggestion. Obviously any GFP_FS allocation in xfs_fs_freeze()
paths will trigger the same warning.

I added this hack

--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1333,10 +1333,15 @@ xfs_fs_freeze(
struct super_block  *sb)
 {
struct xfs_mount*mp = XFS_M(sb);
+   int ret;
 
+   current->flags |= PF_FSTRANS; // tell kmem_flags_convert() to 
remove GFP_FS
xfs_save_resvblks(mp);
xfs_quiesce_attr(mp);
-   return xfs_sync_sb(mp, true);
+   ret = xfs_sync_sb(mp, true);
+   current->flags &= ~PF_FSTRANS;
+
+   return ret;
 }

just for testing purposes and after that I got another warning below. I didn't
read it carefully yet, but _at first glance_ it looks like the lock inversion
uncovered by 2/2, although I can be easily wrong. 
cancel_delayed_work_sync(l_work)
under sb_internal can hang if xfs_log_worker() waits for this rwsem?`

Oleg.

=
[ INFO: possible irq lock inversion dependency detected ]
4.8.0+ #10 Tainted: GW  
-
kswapd0/32 just changed the state of lock:
 (sb_internal){.?}, at: [] __sb_start_write+0xb7/0xf0
but this lock took another, RECLAIM_FS-unsafe lock in the past:
 ((&(>l_work)->work)){+.+.+.}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
 Possible interrupt unsafe locking scenario:
   CPU0CPU1
   
  lock((&(>l_work)->work));
   local_irq_disable();
   lock(sb_internal);
   lock((&(>l_work)->work));
  
lock(sb_internal);

 *** DEADLOCK ***
no locks held by kswapd0/32.

the shortest dependencies between 2nd lock and 1st lock:
 -> ((&(>l_work)->work)){+.+.+.} ops: 803 {
HARDIRQ-ON-W at:
  [] __lock_acquire+0x611/0x1870
  [] lock_acquire+0x10d/0x200
  [] process_one_work+0x1c9/0x690
  [] worker_thread+0x4e/0x480
  [] kthread+0xf3/0x110
  [] ret_from_fork+0x1f/0x40
SOFTIRQ-ON-W at:
  [] __lock_acquire+0x643/0x1870
  [] lock_acquire+0x10d/0x200
  [] process_one_work+0x1c9/0x690
  [] worker_thread+0x4e/0x480
  [] kthread+0xf3/0x110
  [] ret_from_fork+0x1f/0x40
RECLAIM_FS-ON-W at:
 [] mark_held_locks+0x6f/0xa0
 [] lockdep_trace_alloc+0xd3/0x120
 [] kmem_cache_alloc+0x2f/0x280
 [] kmem_zone_alloc+0x81/0x120
 [] xfs_trans_alloc+0x6c/0x130
 [] xfs_sync_sb+0x39/0x80
 [] xfs_log_worker+0xd4/0xf0
 [] process_one_work+0x1f0/0x690
 [] worker_thread+0x4e/0x480
 [] kthread+0xf3/0x110
 [] ret_from_fork+0x1f/0x40
INITIAL USE at:
 [] __lock_acquire+0x24b/0x1870
 [] lock_acquire+0x10d/0x200
 [] process_one_work+0x1c9/0x690
 [] worker_thread+0x4e/0x480
 [] kthread+0xf3/0x110
 [] ret_from_fork+0x1f/0x40
  }
  ... key  at: [] __key.130101+0x0/0x8
  ... acquired at:
   [] lock_acquire+0x10d/0x200
   [] flush_work+0x4c/0x2c0
   [] __cancel_work_timer+0x131/0x1f0
   [] cancel_delayed_work_sync+0x13/0x20
   [] xfs_log_quiesce+0x34/0x3b0
   [] xfs_quiesce_attr+0x6d/0xb0
   [] xfs_fs_freeze+0x33/0x50
   [] 

Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-07 Thread Oleg Nesterov
On 10/06, Johannes Weiner wrote:
>
> On Tue, Oct 04, 2016 at 01:48:00PM +0200, Michal Hocko wrote:
> > Johannes is already looking into this
> > http://lkml.kernel.org/r/20161004093216.ga21...@cmpxchg.org
> >
> > On Tue 04-10-16 13:43:43, Oleg Nesterov wrote:
> > > because of kernel bug:
> > >
> > >   [ 2730.242537] run fstests generic/274 at 2016-10-04 05:17:34
> > >   [ 2730.738352] XFS (loop1): Mounting V5 Filesystem
> > >   [ 2730.741451] XFS (loop1): Ending clean mount
> > >   [ 2744.508698] [ cut here ]
> > >   [ 2744.509190] kernel BUG at ./include/linux/swap.h:276!
> > >
> > > static inline void workingset_node_shadows_dec(struct radix_tree_node 
> > > *node)
> > > {
> > >   VM_BUG_ON(!workingset_node_shadows(node));
> > >   node->count -= 1U << RADIX_TREE_COUNT_SHIFT;
>
> We tracked this one down and Linus merged a fix for this issue:
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d3798ae8c6f3767c726403c2ca6ecc317752c9dd

Confirm, generic/274 no longer crashes the kernel.

Thanks.

Oleg.



Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-07 Thread Oleg Nesterov
On 10/06, Johannes Weiner wrote:
>
> On Tue, Oct 04, 2016 at 01:48:00PM +0200, Michal Hocko wrote:
> > Johannes is already looking into this
> > http://lkml.kernel.org/r/20161004093216.ga21...@cmpxchg.org
> >
> > On Tue 04-10-16 13:43:43, Oleg Nesterov wrote:
> > > because of kernel bug:
> > >
> > >   [ 2730.242537] run fstests generic/274 at 2016-10-04 05:17:34
> > >   [ 2730.738352] XFS (loop1): Mounting V5 Filesystem
> > >   [ 2730.741451] XFS (loop1): Ending clean mount
> > >   [ 2744.508698] [ cut here ]
> > >   [ 2744.509190] kernel BUG at ./include/linux/swap.h:276!
> > >
> > > static inline void workingset_node_shadows_dec(struct radix_tree_node 
> > > *node)
> > > {
> > >   VM_BUG_ON(!workingset_node_shadows(node));
> > >   node->count -= 1U << RADIX_TREE_COUNT_SHIFT;
>
> We tracked this one down and Linus merged a fix for this issue:
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d3798ae8c6f3767c726403c2ca6ecc317752c9dd

Confirm, generic/274 no longer crashes the kernel.

Thanks.

Oleg.



Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-06 Thread Dave Chinner
On Thu, Oct 06, 2016 at 07:17:58PM +0200, Oleg Nesterov wrote:
> Probably false positive? Although when I look at the comment above 
> xfs_sync_sb()
> I think that may be sometging like below makes sense, but I know absolutely 
> nothing
> about fs/ and XFS in particular.
> 
> Oleg.
> 
> 
> --- x/fs/xfs/xfs_trans.c
> +++ x/fs/xfs/xfs_trans.c
> @@ -245,7 +245,8 @@ xfs_trans_alloc(
>   atomic_inc(>m_active_trans);
>  
>   tp = kmem_zone_zalloc(xfs_trans_zone,
> - (flags & XFS_TRANS_NOFS) ? KM_NOFS : KM_SLEEP);
> + (flags & (XFS_TRANS_NOFS | XFS_TRANS_NO_WRITECOUNT))
> + ? KM_NOFS : KM_SLEEP);
>   tp->t_magic = XFS_TRANS_HEADER_MAGIC;
>   tp->t_flags = flags;
>   tp->t_mountp = mp;

Brief examination says caller should set XFS_TRANS_NOFS, not change
the implementation to make XFS_TRANS_NO_WRITECOUNT flag to also mean
XFS_TRANS_NOFS.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-06 Thread Dave Chinner
On Thu, Oct 06, 2016 at 07:17:58PM +0200, Oleg Nesterov wrote:
> Probably false positive? Although when I look at the comment above 
> xfs_sync_sb()
> I think that may be sometging like below makes sense, but I know absolutely 
> nothing
> about fs/ and XFS in particular.
> 
> Oleg.
> 
> 
> --- x/fs/xfs/xfs_trans.c
> +++ x/fs/xfs/xfs_trans.c
> @@ -245,7 +245,8 @@ xfs_trans_alloc(
>   atomic_inc(>m_active_trans);
>  
>   tp = kmem_zone_zalloc(xfs_trans_zone,
> - (flags & XFS_TRANS_NOFS) ? KM_NOFS : KM_SLEEP);
> + (flags & (XFS_TRANS_NOFS | XFS_TRANS_NO_WRITECOUNT))
> + ? KM_NOFS : KM_SLEEP);
>   tp->t_magic = XFS_TRANS_HEADER_MAGIC;
>   tp->t_flags = flags;
>   tp->t_mountp = mp;

Brief examination says caller should set XFS_TRANS_NOFS, not change
the implementation to make XFS_TRANS_NO_WRITECOUNT flag to also mean
XFS_TRANS_NOFS.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-06 Thread Oleg Nesterov
On 10/05, Oleg Nesterov wrote:
>
> On 10/05, Dave Chinner wrote:
> >
> > On Tue, Oct 04, 2016 at 01:43:43PM +0200, Oleg Nesterov wrote:
> >
> > > plus the following warnings:
> > >
> > >   [ 1894.500040] run fstests generic/070 at 2016-10-04 05:03:39
> > >   [ 1895.076655] =
> > >   [ 1895.077136] [ INFO: inconsistent lock state ]
> > >   [ 1895.077574] 4.8.0 #1 Not tainted
> > >   [ 1895.077900] -
> > >   [ 1895.078330] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} 
> > > usage.
> > >   [ 1895.078993] fsstress/18239 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > >   [ 1895.079522]  (_nondir_ilock_class){?-}, at: 
> > > [] xfs_ilock+0x165/0x210 [xfs]
> > >   [ 1895.080529] {IN-RECLAIM_FS-W} state was registered at:
> >
> > And that is a bug in the lockdep annotations for memory allocation because 
> > it
> > fails to take into account the current task flags that are set via
> > memalloc_noio_save() to prevent vmalloc from doing GFP_KERNEL allocations. 
> > i.e.
> > in _xfs_buf_map_pages():
>
> OK, I see...
>
> I'll re-test with the following change:
>
>   --- a/kernel/locking/lockdep.c
>   +++ b/kernel/locking/lockdep.c
>   @@ -2867,7 +2867,7 @@ static void __lockdep_trace_alloc(gfp_t gfp_mask, 
> unsigned long flags)
>   return;
>
>   /* We're only interested __GFP_FS allocations for now */
>   -   if (!(gfp_mask & __GFP_FS))
>   +   if ((curr->flags & PF_MEMALLOC_NOIO) || !(gfp_mask & __GFP_FS))
>   return;
>

and with the change above "./check -b auto" finishes without lockdep warnings,
probably I should send this patch to lockdep maintainers.

Now, with 2/2 applied I got the following:

[ INFO: inconsistent lock state ]
4.8.0+ #4 Tainted: GW
-
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
kswapd0/32 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (sb_internal){+?}, at: [] 
__sb_start_write+0xb7/0xf0
{RECLAIM_FS-ON-W} state was registered at:
  [] mark_held_locks+0x6f/0xa0
  [] lockdep_trace_alloc+0xd3/0x120
  [] kmem_cache_alloc+0x2f/0x280
  [] kmem_zone_alloc+0x81/0x120 [xfs]
  [] xfs_trans_alloc+0x6c/0x130 [xfs]
  [] xfs_sync_sb+0x39/0x80 [xfs]
  [] xfs_log_sbcount+0x4d/0x50 [xfs]
  [] xfs_quiesce_attr+0x57/0xb0 [xfs]
  [] xfs_fs_freeze+0x21/0x40 [xfs]
  [] freeze_super+0xcf/0x190
  [] do_vfs_ioctl+0x55f/0x6c0
  [] SyS_ioctl+0x79/0x90
  [] entry_SYSCALL_64_fastpath+0x1f/0xbd
irq event stamp: 36471805
hardirqs last  enabled at (36471805): [] 
clear_page_dirty_for_io+0x1ed/0x2e0
hardirqs last disabled at (36471804): [] 
clear_page_dirty_for_io+0x1bd/0x2e0
softirqs last  enabled at (36468590): [] 
__do_softirq+0x37a/0x44d
softirqs last disabled at (36468579): [] 
irq_exit+0xe5/0xf0

other info that might help us debug this:
 Possible unsafe locking scenario:
   CPU0
   
  lock(sb_internal);
  
lock(sb_internal);

 *** DEADLOCK ***
no locks held by kswapd0/32.

stack backtrace:
CPU: 0 PID: 32 Comm: kswapd0 Tainted: GW   4.8.0+ #4
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 0086 028a3434 880139b2b520 91449193
 880139b1a680 928c1e70 880139b2b570 91106c75
  0001 88010001 000a
Call Trace:
 [] dump_stack+0x85/0xc2
 [] print_usage_bug+0x215/0x240
 [] mark_lock+0x58b/0x650
 [] ? print_shortest_lock_dependencies+0x1a0/0x1a0
 [] __lock_acquire+0x36d/0x1870
 [] lock_acquire+0x10d/0x200
 [] ? __sb_start_write+0xb7/0xf0
 [] percpu_down_read+0x3c/0x90
 [] ? __sb_start_write+0xb7/0xf0
 [] __sb_start_write+0xb7/0xf0
 [] xfs_trans_alloc+0xe3/0x130 [xfs]
 [] xfs_iomap_write_allocate+0x1f7/0x380 [xfs]
 [] ? xfs_map_blocks+0xe3/0x380 [xfs]
 [] ? rcu_read_lock_sched_held+0x58/0x60
 [] xfs_map_blocks+0x22a/0x380 [xfs]
 [] xfs_do_writepage+0x188/0x6c0 [xfs]
 [] xfs_vm_writepage+0x3b/0x70 [xfs]
 [] pageout.isra.46+0x190/0x380
 [] shrink_page_list+0x9ab/0xa70
 [] shrink_inactive_list+0x252/0x5d0
 [] shrink_node_memcg+0x5af/0x790
 [] shrink_node+0xe1/0x320
 [] kswapd+0x387/0x8b0

Probably false positive? Although when I look at the comment above xfs_sync_sb()
I think that may be sometging like below makes sense, but I know absolutely 
nothing
about fs/ and XFS in particular.

Oleg.


--- x/fs/xfs/xfs_trans.c
+++ x/fs/xfs/xfs_trans.c
@@ -245,7 +245,8 @@ xfs_trans_alloc(
atomic_inc(>m_active_trans);
 
   

Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-06 Thread Oleg Nesterov
On 10/05, Oleg Nesterov wrote:
>
> On 10/05, Dave Chinner wrote:
> >
> > On Tue, Oct 04, 2016 at 01:43:43PM +0200, Oleg Nesterov wrote:
> >
> > > plus the following warnings:
> > >
> > >   [ 1894.500040] run fstests generic/070 at 2016-10-04 05:03:39
> > >   [ 1895.076655] =
> > >   [ 1895.077136] [ INFO: inconsistent lock state ]
> > >   [ 1895.077574] 4.8.0 #1 Not tainted
> > >   [ 1895.077900] -
> > >   [ 1895.078330] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} 
> > > usage.
> > >   [ 1895.078993] fsstress/18239 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > >   [ 1895.079522]  (_nondir_ilock_class){?-}, at: 
> > > [] xfs_ilock+0x165/0x210 [xfs]
> > >   [ 1895.080529] {IN-RECLAIM_FS-W} state was registered at:
> >
> > And that is a bug in the lockdep annotations for memory allocation because 
> > it
> > fails to take into account the current task flags that are set via
> > memalloc_noio_save() to prevent vmalloc from doing GFP_KERNEL allocations. 
> > i.e.
> > in _xfs_buf_map_pages():
>
> OK, I see...
>
> I'll re-test with the following change:
>
>   --- a/kernel/locking/lockdep.c
>   +++ b/kernel/locking/lockdep.c
>   @@ -2867,7 +2867,7 @@ static void __lockdep_trace_alloc(gfp_t gfp_mask, 
> unsigned long flags)
>   return;
>
>   /* We're only interested __GFP_FS allocations for now */
>   -   if (!(gfp_mask & __GFP_FS))
>   +   if ((curr->flags & PF_MEMALLOC_NOIO) || !(gfp_mask & __GFP_FS))
>   return;
>

and with the change above "./check -b auto" finishes without lockdep warnings,
probably I should send this patch to lockdep maintainers.

Now, with 2/2 applied I got the following:

[ INFO: inconsistent lock state ]
4.8.0+ #4 Tainted: GW
-
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
kswapd0/32 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (sb_internal){+?}, at: [] 
__sb_start_write+0xb7/0xf0
{RECLAIM_FS-ON-W} state was registered at:
  [] mark_held_locks+0x6f/0xa0
  [] lockdep_trace_alloc+0xd3/0x120
  [] kmem_cache_alloc+0x2f/0x280
  [] kmem_zone_alloc+0x81/0x120 [xfs]
  [] xfs_trans_alloc+0x6c/0x130 [xfs]
  [] xfs_sync_sb+0x39/0x80 [xfs]
  [] xfs_log_sbcount+0x4d/0x50 [xfs]
  [] xfs_quiesce_attr+0x57/0xb0 [xfs]
  [] xfs_fs_freeze+0x21/0x40 [xfs]
  [] freeze_super+0xcf/0x190
  [] do_vfs_ioctl+0x55f/0x6c0
  [] SyS_ioctl+0x79/0x90
  [] entry_SYSCALL_64_fastpath+0x1f/0xbd
irq event stamp: 36471805
hardirqs last  enabled at (36471805): [] 
clear_page_dirty_for_io+0x1ed/0x2e0
hardirqs last disabled at (36471804): [] 
clear_page_dirty_for_io+0x1bd/0x2e0
softirqs last  enabled at (36468590): [] 
__do_softirq+0x37a/0x44d
softirqs last disabled at (36468579): [] 
irq_exit+0xe5/0xf0

other info that might help us debug this:
 Possible unsafe locking scenario:
   CPU0
   
  lock(sb_internal);
  
lock(sb_internal);

 *** DEADLOCK ***
no locks held by kswapd0/32.

stack backtrace:
CPU: 0 PID: 32 Comm: kswapd0 Tainted: GW   4.8.0+ #4
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 0086 028a3434 880139b2b520 91449193
 880139b1a680 928c1e70 880139b2b570 91106c75
  0001 88010001 000a
Call Trace:
 [] dump_stack+0x85/0xc2
 [] print_usage_bug+0x215/0x240
 [] mark_lock+0x58b/0x650
 [] ? print_shortest_lock_dependencies+0x1a0/0x1a0
 [] __lock_acquire+0x36d/0x1870
 [] lock_acquire+0x10d/0x200
 [] ? __sb_start_write+0xb7/0xf0
 [] percpu_down_read+0x3c/0x90
 [] ? __sb_start_write+0xb7/0xf0
 [] __sb_start_write+0xb7/0xf0
 [] xfs_trans_alloc+0xe3/0x130 [xfs]
 [] xfs_iomap_write_allocate+0x1f7/0x380 [xfs]
 [] ? xfs_map_blocks+0xe3/0x380 [xfs]
 [] ? rcu_read_lock_sched_held+0x58/0x60
 [] xfs_map_blocks+0x22a/0x380 [xfs]
 [] xfs_do_writepage+0x188/0x6c0 [xfs]
 [] xfs_vm_writepage+0x3b/0x70 [xfs]
 [] pageout.isra.46+0x190/0x380
 [] shrink_page_list+0x9ab/0xa70
 [] shrink_inactive_list+0x252/0x5d0
 [] shrink_node_memcg+0x5af/0x790
 [] shrink_node+0xe1/0x320
 [] kswapd+0x387/0x8b0

Probably false positive? Although when I look at the comment above xfs_sync_sb()
I think that may be sometging like below makes sense, but I know absolutely 
nothing
about fs/ and XFS in particular.

Oleg.


--- x/fs/xfs/xfs_trans.c
+++ x/fs/xfs/xfs_trans.c
@@ -245,7 +245,8 @@ xfs_trans_alloc(
atomic_inc(>m_active_trans);
 
   

Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-06 Thread Johannes Weiner
On Tue, Oct 04, 2016 at 01:48:00PM +0200, Michal Hocko wrote:
> Johannes is already looking into this
> http://lkml.kernel.org/r/20161004093216.ga21...@cmpxchg.org
> 
> On Tue 04-10-16 13:43:43, Oleg Nesterov wrote:
> > because of kernel bug:
> > 
> > [ 2730.242537] run fstests generic/274 at 2016-10-04 05:17:34
> > [ 2730.738352] XFS (loop1): Mounting V5 Filesystem
> > [ 2730.741451] XFS (loop1): Ending clean mount
> > [ 2744.508698] [ cut here ]
> > [ 2744.509190] kernel BUG at ./include/linux/swap.h:276!
> > 
> > static inline void workingset_node_shadows_dec(struct radix_tree_node *node)
> > {
> > VM_BUG_ON(!workingset_node_shadows(node));
> > node->count -= 1U << RADIX_TREE_COUNT_SHIFT;

We tracked this one down and Linus merged a fix for this issue:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d3798ae8c6f3767c726403c2ca6ecc317752c9dd

Let me know if this still fires on kernels with that commit.
Thanks!


Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-06 Thread Johannes Weiner
On Tue, Oct 04, 2016 at 01:48:00PM +0200, Michal Hocko wrote:
> Johannes is already looking into this
> http://lkml.kernel.org/r/20161004093216.ga21...@cmpxchg.org
> 
> On Tue 04-10-16 13:43:43, Oleg Nesterov wrote:
> > because of kernel bug:
> > 
> > [ 2730.242537] run fstests generic/274 at 2016-10-04 05:17:34
> > [ 2730.738352] XFS (loop1): Mounting V5 Filesystem
> > [ 2730.741451] XFS (loop1): Ending clean mount
> > [ 2744.508698] [ cut here ]
> > [ 2744.509190] kernel BUG at ./include/linux/swap.h:276!
> > 
> > static inline void workingset_node_shadows_dec(struct radix_tree_node *node)
> > {
> > VM_BUG_ON(!workingset_node_shadows(node));
> > node->count -= 1U << RADIX_TREE_COUNT_SHIFT;

We tracked this one down and Linus merged a fix for this issue:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d3798ae8c6f3767c726403c2ca6ecc317752c9dd

Let me know if this still fires on kernels with that commit.
Thanks!


Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-06 Thread Jan Kara
On Wed 05-10-16 18:44:32, Oleg Nesterov wrote:
> On 10/05, Dave Chinner wrote:
> >
> > On Tue, Oct 04, 2016 at 01:43:43PM +0200, Oleg Nesterov wrote:
> >
> > > plus the following warnings:
> > >
> > >   [ 1894.500040] run fstests generic/070 at 2016-10-04 05:03:39
> > >   [ 1895.076655] =
> > >   [ 1895.077136] [ INFO: inconsistent lock state ]
> > >   [ 1895.077574] 4.8.0 #1 Not tainted
> > >   [ 1895.077900] -
> > >   [ 1895.078330] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} 
> > > usage.
> > >   [ 1895.078993] fsstress/18239 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > >   [ 1895.079522]  (_nondir_ilock_class){?-}, at: 
> > > [] xfs_ilock+0x165/0x210 [xfs]
> > >   [ 1895.080529] {IN-RECLAIM_FS-W} state was registered at:
> >
> > And that is a bug in the lockdep annotations for memory allocation because 
> > it
> > fails to take into account the current task flags that are set via
> > memalloc_noio_save() to prevent vmalloc from doing GFP_KERNEL allocations. 
> > i.e.
> > in _xfs_buf_map_pages():
> 
> OK, I see...
> 
> I'll re-test with the following change:
> 
>   --- a/kernel/locking/lockdep.c
>   +++ b/kernel/locking/lockdep.c
>   @@ -2867,7 +2867,7 @@ static void __lockdep_trace_alloc(gfp_t gfp_mask, 
> unsigned long flags)
>   return;
>
>   /* We're only interested __GFP_FS allocations for now */
>   -   if (!(gfp_mask & __GFP_FS))
>   +   if ((curr->flags & PF_MEMALLOC_NOIO) || !(gfp_mask & __GFP_FS))
>   return;
> 
> 
> Hmm. This is off-topic and most probably I missed something... but at
> first glance we can simplify/improve the reclaim-fs lockdep annotations:
> 
> 1. add the global "struct lockdep_map reclaim_fs_map"
> 
> 2. change __lockdep_trace_alloc
> 
>   -   mark_held_locks(curr, RECLAIM_FS);
>   +   lock_map_acquire(_fs_map);
>   +   lock_map_release(_fs_map);
> 
> 3. turn lockdep_set/clear_current_reclaim_state() into
> 
>   void lockdep_set_current_reclaim_state(gfp_t gfp_mask)
>   {
>   if (gfp_mask & __GFP_FS)
>   lock_map_acquire(_fs_map);
>   }
> 
>   void lockdep_clear_current_reclaim_state(gfp_t gfp_mask)
>   {
>   if (gfp_mask & __GFP_FS)
>   lock_map_release(_fs_map);
>   }
> 
> and now we can remove task_struct->lockdep_reclaim_gfp and all other
> RECLAIM_FS hacks in lockdep.c. Plus we can easily extend this logic to
> check more GFP_ flags.

Yeah, looks possible to me. I've added Peter to CC since he's most likely
to know.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-06 Thread Jan Kara
On Wed 05-10-16 18:44:32, Oleg Nesterov wrote:
> On 10/05, Dave Chinner wrote:
> >
> > On Tue, Oct 04, 2016 at 01:43:43PM +0200, Oleg Nesterov wrote:
> >
> > > plus the following warnings:
> > >
> > >   [ 1894.500040] run fstests generic/070 at 2016-10-04 05:03:39
> > >   [ 1895.076655] =
> > >   [ 1895.077136] [ INFO: inconsistent lock state ]
> > >   [ 1895.077574] 4.8.0 #1 Not tainted
> > >   [ 1895.077900] -
> > >   [ 1895.078330] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} 
> > > usage.
> > >   [ 1895.078993] fsstress/18239 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > >   [ 1895.079522]  (_nondir_ilock_class){?-}, at: 
> > > [] xfs_ilock+0x165/0x210 [xfs]
> > >   [ 1895.080529] {IN-RECLAIM_FS-W} state was registered at:
> >
> > And that is a bug in the lockdep annotations for memory allocation because 
> > it
> > fails to take into account the current task flags that are set via
> > memalloc_noio_save() to prevent vmalloc from doing GFP_KERNEL allocations. 
> > i.e.
> > in _xfs_buf_map_pages():
> 
> OK, I see...
> 
> I'll re-test with the following change:
> 
>   --- a/kernel/locking/lockdep.c
>   +++ b/kernel/locking/lockdep.c
>   @@ -2867,7 +2867,7 @@ static void __lockdep_trace_alloc(gfp_t gfp_mask, 
> unsigned long flags)
>   return;
>
>   /* We're only interested __GFP_FS allocations for now */
>   -   if (!(gfp_mask & __GFP_FS))
>   +   if ((curr->flags & PF_MEMALLOC_NOIO) || !(gfp_mask & __GFP_FS))
>   return;
> 
> 
> Hmm. This is off-topic and most probably I missed something... but at
> first glance we can simplify/improve the reclaim-fs lockdep annotations:
> 
> 1. add the global "struct lockdep_map reclaim_fs_map"
> 
> 2. change __lockdep_trace_alloc
> 
>   -   mark_held_locks(curr, RECLAIM_FS);
>   +   lock_map_acquire(_fs_map);
>   +   lock_map_release(_fs_map);
> 
> 3. turn lockdep_set/clear_current_reclaim_state() into
> 
>   void lockdep_set_current_reclaim_state(gfp_t gfp_mask)
>   {
>   if (gfp_mask & __GFP_FS)
>   lock_map_acquire(_fs_map);
>   }
> 
>   void lockdep_clear_current_reclaim_state(gfp_t gfp_mask)
>   {
>   if (gfp_mask & __GFP_FS)
>   lock_map_release(_fs_map);
>   }
> 
> and now we can remove task_struct->lockdep_reclaim_gfp and all other
> RECLAIM_FS hacks in lockdep.c. Plus we can easily extend this logic to
> check more GFP_ flags.

Yeah, looks possible to me. I've added Peter to CC since he's most likely
to know.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-05 Thread Oleg Nesterov
On 10/05, Dave Chinner wrote:
>
> On Tue, Oct 04, 2016 at 01:43:43PM +0200, Oleg Nesterov wrote:
>
> > plus the following warnings:
> >
> > [ 1894.500040] run fstests generic/070 at 2016-10-04 05:03:39
> > [ 1895.076655] =
> > [ 1895.077136] [ INFO: inconsistent lock state ]
> > [ 1895.077574] 4.8.0 #1 Not tainted
> > [ 1895.077900] -
> > [ 1895.078330] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} 
> > usage.
> > [ 1895.078993] fsstress/18239 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > [ 1895.079522]  (_nondir_ilock_class){?-}, at: 
> > [] xfs_ilock+0x165/0x210 [xfs]
> > [ 1895.080529] {IN-RECLAIM_FS-W} state was registered at:
>
> And that is a bug in the lockdep annotations for memory allocation because it
> fails to take into account the current task flags that are set via
> memalloc_noio_save() to prevent vmalloc from doing GFP_KERNEL allocations. 
> i.e.
> in _xfs_buf_map_pages():

OK, I see...

I'll re-test with the following change:

--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2867,7 +2867,7 @@ static void __lockdep_trace_alloc(gfp_t gfp_mask, 
unsigned long flags)
return;
 
/* We're only interested __GFP_FS allocations for now */
-   if (!(gfp_mask & __GFP_FS))
+   if ((curr->flags & PF_MEMALLOC_NOIO) || !(gfp_mask & __GFP_FS))
return;


Hmm. This is off-topic and most probably I missed something... but at
first glance we can simplify/improve the reclaim-fs lockdep annotations:

1. add the global "struct lockdep_map reclaim_fs_map"

2. change __lockdep_trace_alloc

-   mark_held_locks(curr, RECLAIM_FS);
+   lock_map_acquire(_fs_map);
+   lock_map_release(_fs_map);

3. turn lockdep_set/clear_current_reclaim_state() into

void lockdep_set_current_reclaim_state(gfp_t gfp_mask)
{
if (gfp_mask & __GFP_FS)
lock_map_acquire(_fs_map);
}

void lockdep_clear_current_reclaim_state(gfp_t gfp_mask)
{
if (gfp_mask & __GFP_FS)
lock_map_release(_fs_map);
}

and now we can remove task_struct->lockdep_reclaim_gfp and all other
RECLAIM_FS hacks in lockdep.c. Plus we can easily extend this logic to
check more GFP_ flags.

No?

Oleg.



Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-05 Thread Oleg Nesterov
On 10/05, Dave Chinner wrote:
>
> On Tue, Oct 04, 2016 at 01:43:43PM +0200, Oleg Nesterov wrote:
>
> > plus the following warnings:
> >
> > [ 1894.500040] run fstests generic/070 at 2016-10-04 05:03:39
> > [ 1895.076655] =
> > [ 1895.077136] [ INFO: inconsistent lock state ]
> > [ 1895.077574] 4.8.0 #1 Not tainted
> > [ 1895.077900] -
> > [ 1895.078330] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} 
> > usage.
> > [ 1895.078993] fsstress/18239 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > [ 1895.079522]  (_nondir_ilock_class){?-}, at: 
> > [] xfs_ilock+0x165/0x210 [xfs]
> > [ 1895.080529] {IN-RECLAIM_FS-W} state was registered at:
>
> And that is a bug in the lockdep annotations for memory allocation because it
> fails to take into account the current task flags that are set via
> memalloc_noio_save() to prevent vmalloc from doing GFP_KERNEL allocations. 
> i.e.
> in _xfs_buf_map_pages():

OK, I see...

I'll re-test with the following change:

--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2867,7 +2867,7 @@ static void __lockdep_trace_alloc(gfp_t gfp_mask, 
unsigned long flags)
return;
 
/* We're only interested __GFP_FS allocations for now */
-   if (!(gfp_mask & __GFP_FS))
+   if ((curr->flags & PF_MEMALLOC_NOIO) || !(gfp_mask & __GFP_FS))
return;


Hmm. This is off-topic and most probably I missed something... but at
first glance we can simplify/improve the reclaim-fs lockdep annotations:

1. add the global "struct lockdep_map reclaim_fs_map"

2. change __lockdep_trace_alloc

-   mark_held_locks(curr, RECLAIM_FS);
+   lock_map_acquire(_fs_map);
+   lock_map_release(_fs_map);

3. turn lockdep_set/clear_current_reclaim_state() into

void lockdep_set_current_reclaim_state(gfp_t gfp_mask)
{
if (gfp_mask & __GFP_FS)
lock_map_acquire(_fs_map);
}

void lockdep_clear_current_reclaim_state(gfp_t gfp_mask)
{
if (gfp_mask & __GFP_FS)
lock_map_release(_fs_map);
}

and now we can remove task_struct->lockdep_reclaim_gfp and all other
RECLAIM_FS hacks in lockdep.c. Plus we can easily extend this logic to
check more GFP_ flags.

No?

Oleg.



Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-05 Thread Oleg Nesterov
On 10/05, Dave Chinner wrote:
>
> On Tue, Oct 04, 2016 at 06:58:27PM +0200, Oleg Nesterov wrote:
> > I removed this test and then the next run (after reboot) hangs at xfs/073 
> > with
> > a lot of errors in dmesg like
> >
> > XFS (loop2): Failing async write on buffer block 0x9600790. Retrying 
> > async write.
> > blk_update_request: I/O error, dev loop2, sector 8389920
> > loop: Write error at byte offset 4295647232, length 4096.
>
> tests will dump lots of errors into dmesg. That doesn't mean there's
> a problem - many tests are designed to exercise error paths. That,
> however, looks like a loop device problem.
>
> xfs/073 is using loop devices internally itself, so this ends up
> with XFS on loop2 on XFS on loop1. That loop2 device is 100GB in
> size, and the test copies the xfstests source tree to a loopback
> filesystem image in $SCRATCH_DEV mounted on $TEST_DIR/$$.
>
> The issue is, most likely, that your TEST_DIR and SCRATCH_MNT are
> rooted in the xfstests source tree. Hence copying the xfstests
> source tree will also try to copy the 8GB image files into the
> filesystems that the image files contain. Which, clearly, will
> eventually result in ENOSPC errors when writing to the underlying
> loop device...
>
> Put your TEST_DIR and SCRATCHMNT mount points outside the xfstests
> directory, and this should go away.

Yes, thanks, xfs/073 no longer hangs.

Oleg.



Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-05 Thread Oleg Nesterov
On 10/05, Dave Chinner wrote:
>
> On Tue, Oct 04, 2016 at 06:58:27PM +0200, Oleg Nesterov wrote:
> > I removed this test and then the next run (after reboot) hangs at xfs/073 
> > with
> > a lot of errors in dmesg like
> >
> > XFS (loop2): Failing async write on buffer block 0x9600790. Retrying 
> > async write.
> > blk_update_request: I/O error, dev loop2, sector 8389920
> > loop: Write error at byte offset 4295647232, length 4096.
>
> tests will dump lots of errors into dmesg. That doesn't mean there's
> a problem - many tests are designed to exercise error paths. That,
> however, looks like a loop device problem.
>
> xfs/073 is using loop devices internally itself, so this ends up
> with XFS on loop2 on XFS on loop1. That loop2 device is 100GB in
> size, and the test copies the xfstests source tree to a loopback
> filesystem image in $SCRATCH_DEV mounted on $TEST_DIR/$$.
>
> The issue is, most likely, that your TEST_DIR and SCRATCH_MNT are
> rooted in the xfstests source tree. Hence copying the xfstests
> source tree will also try to copy the 8GB image files into the
> filesystems that the image files contain. Which, clearly, will
> eventually result in ENOSPC errors when writing to the underlying
> loop device...
>
> Put your TEST_DIR and SCRATCHMNT mount points outside the xfstests
> directory, and this should go away.

Yes, thanks, xfs/073 no longer hangs.

Oleg.



Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-04 Thread Dave Chinner
On Tue, Oct 04, 2016 at 06:58:27PM +0200, Oleg Nesterov wrote:
> I removed this test and then the next run (after reboot) hangs at xfs/073 with
> a lot of errors in dmesg like
> 
>   XFS (loop2): Failing async write on buffer block 0x9600790. Retrying 
> async write.
>   blk_update_request: I/O error, dev loop2, sector 8389920
>   loop: Write error at byte offset 4295647232, length 4096.

tests will dump lots of errors into dmesg. That doesn't mean there's
a problem - many tests are designed to exercise error paths. That,
however, looks like a loop device problem.

xfs/073 is using loop devices internally itself, so this ends up
with XFS on loop2 on XFS on loop1. That loop2 device is 100GB in
size, and the test copies the xfstests source tree to a loopback
filesystem image in $SCRATCH_DEV mounted on $TEST_DIR/$$.

The issue is, most likely, that your TEST_DIR and SCRATCH_MNT are
rooted in the xfstests source tree. Hence copying the xfstests
source tree will also try to copy the 8GB image files into the
filesystems that the image files contain. Which, clearly, will
eventually result in ENOSPC errors when writing to the underlying
loop device...

Put your TEST_DIR and SCRATCHMNT mount points outside the xfstests
directory, and this should go away. Most people use /mnt/test and
/mnt/scratch for these

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-04 Thread Dave Chinner
On Tue, Oct 04, 2016 at 06:58:27PM +0200, Oleg Nesterov wrote:
> I removed this test and then the next run (after reboot) hangs at xfs/073 with
> a lot of errors in dmesg like
> 
>   XFS (loop2): Failing async write on buffer block 0x9600790. Retrying 
> async write.
>   blk_update_request: I/O error, dev loop2, sector 8389920
>   loop: Write error at byte offset 4295647232, length 4096.

tests will dump lots of errors into dmesg. That doesn't mean there's
a problem - many tests are designed to exercise error paths. That,
however, looks like a loop device problem.

xfs/073 is using loop devices internally itself, so this ends up
with XFS on loop2 on XFS on loop1. That loop2 device is 100GB in
size, and the test copies the xfstests source tree to a loopback
filesystem image in $SCRATCH_DEV mounted on $TEST_DIR/$$.

The issue is, most likely, that your TEST_DIR and SCRATCH_MNT are
rooted in the xfstests source tree. Hence copying the xfstests
source tree will also try to copy the 8GB image files into the
filesystems that the image files contain. Which, clearly, will
eventually result in ENOSPC errors when writing to the underlying
loop device...

Put your TEST_DIR and SCRATCHMNT mount points outside the xfstests
directory, and this should go away. Most people use /mnt/test and
/mnt/scratch for these

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-04 Thread Dave Chinner
On Tue, Oct 04, 2016 at 01:43:43PM +0200, Oleg Nesterov wrote:
> On 10/03, Oleg Nesterov wrote:
> >
> > On 10/03, Dave Chinner wrote:
> > >
> > > On Fri, Sep 30, 2016 at 07:14:34PM +0200, Oleg Nesterov wrote:
> > > > On 09/27, Oleg Nesterov wrote:
> > > > >
> > > > Jan, I gave up.
> > > >
> > > > Whatever I did xfstests-dev/check reports a lot of failures, kernel 
> > > > bugs,
> > > > and finally either crashes the kernel or hangs.
> > >
> > > Did you run the auto group like I suggested? That's the set of tests
> > > that should complete successfully with minimal failures and without
> > > crashing the machine.
> >
> > OK, I'll reserve a testing machine again.
> 
> Fedora 23 running in KVM guest
> 
> kernel v4.8 clean, see .config below
> 
> xfstests from git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
> last commit 0bbd20b104765c75faaf8e28a54c32df505ce7fd ("btrfs: test
> free space tree mount options")
> 
> script:
> 
>   dd if=/dev/zero of=TEST.img bs=1MiB count=8192
>   dd if=/dev/zero of=SCRATCH.img bs=1MiB count=8192
> 
>   losetup --find --show TEST.img
>   losetup --find --show SCRATCH.img
> 
>   mkfs.xfs -f /dev/loop0
>   mkfs.xfs -f /dev/loop1
> 
>   mkdir -p TEST SCRATCH
> 
>   mount /dev/loop0 TEST
>   mount /dev/loop1 SCRATCH
> 
>   TEST_DEV=/dev/loop0 TEST_DIR=TEST SCRATCH_DEV=/dev/loop1 
> SCRATCH_MNT=SCRATCH ./check -g auto

Looks sane.

> This time it hangs after generic/274:
> 
>   --- tests/generic/274.out   2016-10-04 04:23:24.209006171 -0400
>   +++ /root/XFS/xfstests-dev/results//generic/274.out.bad 2016-10-04 
> 05:17:49.291742498 -0400
>   @@ -3,3 +3,4 @@
>   preallocation test
>   --
>   done
>   +./common/rc: line 344:  4693 Segmentation fault  exit
>   ...
>   (Run 'diff -u tests/generic/274.out 
> /root/XFS/xfstests-dev/results//generic/274.out.bad'  to see the entire diff)
> 
> because of kernel bug:
> 
>   [ 2730.242537] run fstests generic/274 at 2016-10-04 05:17:34
>   [ 2730.738352] XFS (loop1): Mounting V5 Filesystem
>   [ 2730.741451] XFS (loop1): Ending clean mount
>   [ 2744.508698] [ cut here ]
>   [ 2744.509190] kernel BUG at ./include/linux/swap.h:276!
> 
> static inline void workingset_node_shadows_dec(struct radix_tree_node *node)
> {
>   VM_BUG_ON(!workingset_node_shadows(node));
>   node->count -= 1U << RADIX_TREE_COUNT_SHIFT;
> }

Linus had a massive rant about this yesterday. Regression was
introduced between v4.8-rc8 and v4.8 and it took Linus's machine
down. xfstests found a filesystem related kernel bug, as it's
supposed to.

> plus the following warnings:
> 
>   [ 1894.500040] run fstests generic/070 at 2016-10-04 05:03:39
>   [ 1895.076655] =
>   [ 1895.077136] [ INFO: inconsistent lock state ]
>   [ 1895.077574] 4.8.0 #1 Not tainted
>   [ 1895.077900] -
>   [ 1895.078330] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} 
> usage.
>   [ 1895.078993] fsstress/18239 [HC0[0]:SC0[0]:HE1:SE1] takes:
>   [ 1895.079522]  (_nondir_ilock_class){?-}, at: 
> [] xfs_ilock+0x165/0x210 [xfs]
>   [ 1895.080529] {IN-RECLAIM_FS-W} state was registered at:

[snip kswapd doing XFS inode reclaim]

>   [ 1895.102565] CPU: 0 PID: 18239 Comm: fsstress Not tainted 4.8.0 #1
>   [ 1895.103160] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>   [ 1895.103721]  0086 8f20d9c8 880038b73818 
> b6449133
>   [ 1895.104486]  880038a5cd00 b78bcd70 880038b73868 
> b6106c75
>   [ 1895.105257]   0001 8801 
> 0008
>   [ 1895.106034] Call Trace:
>   [ 1895.106281]  [] dump_stack+0x85/0xc2
>   [ 1895.106789]  [] print_usage_bug+0x215/0x240
>   [ 1895.107352]  [] mark_lock+0x58b/0x650
>   [ 1895.107888]  [] ? check_usage_forwards+0x160/0x160
>   [ 1895.108506]  [] mark_held_locks+0x6f/0xa0
>   [ 1895.109085]  [] lockdep_trace_alloc+0xca/0x110
>   [ 1895.109695]  [] 
> kmem_cache_alloc_node_trace+0x39/0x2a0
>   [ 1895.110414]  [] ? vm_map_ram+0x2de/0x490
>   [ 1895.110995]  [] vm_map_ram+0x2de/0x490
>   [ 1895.111536]  [] ? vm_map_ram+0x4b/0x490
>   [ 1895.112142]  [] _xfs_buf_map_pages+0x6c/0x110 [xfs]
>   [ 1895.112837]  [] xfs_buf_get_map+0x2c7/0x470 [xfs]
>   [ 1895.113500]  [] xfs_attr_rmtval_set+0x2c9/0x450 
> [xfs]
>   [ 1895.114235]  [] xfs_attr_node_addname+0x4c9/0x5d0 
> [xfs]
>   [ 1895.114948]  [] xfs_attr_set+0x3dc/0x460 [xfs]
>   [ 1895.115593]  [] xfs_xattr_set+0x4f/0x90 [xfs]
>   [ 1895.116221]  [] generic_setxattr+0x70/0x80
>   [ 1895.116798]  [] __vfs_setxattr_noperm+0xaf/0x1a0
>   [ 1895.117445]  [] ? security_inode_setxattr+0x65/0xb0
>   [ 1895.118121]  [] vfs_setxattr+0xa7/0xb0
>   [ 

Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-04 Thread Dave Chinner
On Tue, Oct 04, 2016 at 01:43:43PM +0200, Oleg Nesterov wrote:
> On 10/03, Oleg Nesterov wrote:
> >
> > On 10/03, Dave Chinner wrote:
> > >
> > > On Fri, Sep 30, 2016 at 07:14:34PM +0200, Oleg Nesterov wrote:
> > > > On 09/27, Oleg Nesterov wrote:
> > > > >
> > > > Jan, I gave up.
> > > >
> > > > Whatever I did xfstests-dev/check reports a lot of failures, kernel 
> > > > bugs,
> > > > and finally either crashes the kernel or hangs.
> > >
> > > Did you run the auto group like I suggested? That's the set of tests
> > > that should complete successfully with minimal failures and without
> > > crashing the machine.
> >
> > OK, I'll reserve a testing machine again.
> 
> Fedora 23 running in KVM guest
> 
> kernel v4.8 clean, see .config below
> 
> xfstests from git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
> last commit 0bbd20b104765c75faaf8e28a54c32df505ce7fd ("btrfs: test
> free space tree mount options")
> 
> script:
> 
>   dd if=/dev/zero of=TEST.img bs=1MiB count=8192
>   dd if=/dev/zero of=SCRATCH.img bs=1MiB count=8192
> 
>   losetup --find --show TEST.img
>   losetup --find --show SCRATCH.img
> 
>   mkfs.xfs -f /dev/loop0
>   mkfs.xfs -f /dev/loop1
> 
>   mkdir -p TEST SCRATCH
> 
>   mount /dev/loop0 TEST
>   mount /dev/loop1 SCRATCH
> 
>   TEST_DEV=/dev/loop0 TEST_DIR=TEST SCRATCH_DEV=/dev/loop1 
> SCRATCH_MNT=SCRATCH ./check -g auto

Looks sane.

> This time it hangs after generic/274:
> 
>   --- tests/generic/274.out   2016-10-04 04:23:24.209006171 -0400
>   +++ /root/XFS/xfstests-dev/results//generic/274.out.bad 2016-10-04 
> 05:17:49.291742498 -0400
>   @@ -3,3 +3,4 @@
>   preallocation test
>   --
>   done
>   +./common/rc: line 344:  4693 Segmentation fault  exit
>   ...
>   (Run 'diff -u tests/generic/274.out 
> /root/XFS/xfstests-dev/results//generic/274.out.bad'  to see the entire diff)
> 
> because of kernel bug:
> 
>   [ 2730.242537] run fstests generic/274 at 2016-10-04 05:17:34
>   [ 2730.738352] XFS (loop1): Mounting V5 Filesystem
>   [ 2730.741451] XFS (loop1): Ending clean mount
>   [ 2744.508698] [ cut here ]
>   [ 2744.509190] kernel BUG at ./include/linux/swap.h:276!
> 
> static inline void workingset_node_shadows_dec(struct radix_tree_node *node)
> {
>   VM_BUG_ON(!workingset_node_shadows(node));
>   node->count -= 1U << RADIX_TREE_COUNT_SHIFT;
> }

Linus had a massive rant about this yesterday. Regression was
introduced between v4.8-rc8 and v4.8 and it took Linus's machine
down. xfstests found a filesystem related kernel bug, as it's
supposed to.

> plus the following warnings:
> 
>   [ 1894.500040] run fstests generic/070 at 2016-10-04 05:03:39
>   [ 1895.076655] =
>   [ 1895.077136] [ INFO: inconsistent lock state ]
>   [ 1895.077574] 4.8.0 #1 Not tainted
>   [ 1895.077900] -
>   [ 1895.078330] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} 
> usage.
>   [ 1895.078993] fsstress/18239 [HC0[0]:SC0[0]:HE1:SE1] takes:
>   [ 1895.079522]  (_nondir_ilock_class){?-}, at: 
> [] xfs_ilock+0x165/0x210 [xfs]
>   [ 1895.080529] {IN-RECLAIM_FS-W} state was registered at:

[snip kswapd doing XFS inode reclaim]

>   [ 1895.102565] CPU: 0 PID: 18239 Comm: fsstress Not tainted 4.8.0 #1
>   [ 1895.103160] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>   [ 1895.103721]  0086 8f20d9c8 880038b73818 
> b6449133
>   [ 1895.104486]  880038a5cd00 b78bcd70 880038b73868 
> b6106c75
>   [ 1895.105257]   0001 8801 
> 0008
>   [ 1895.106034] Call Trace:
>   [ 1895.106281]  [] dump_stack+0x85/0xc2
>   [ 1895.106789]  [] print_usage_bug+0x215/0x240
>   [ 1895.107352]  [] mark_lock+0x58b/0x650
>   [ 1895.107888]  [] ? check_usage_forwards+0x160/0x160
>   [ 1895.108506]  [] mark_held_locks+0x6f/0xa0
>   [ 1895.109085]  [] lockdep_trace_alloc+0xca/0x110
>   [ 1895.109695]  [] 
> kmem_cache_alloc_node_trace+0x39/0x2a0
>   [ 1895.110414]  [] ? vm_map_ram+0x2de/0x490
>   [ 1895.110995]  [] vm_map_ram+0x2de/0x490
>   [ 1895.111536]  [] ? vm_map_ram+0x4b/0x490
>   [ 1895.112142]  [] _xfs_buf_map_pages+0x6c/0x110 [xfs]
>   [ 1895.112837]  [] xfs_buf_get_map+0x2c7/0x470 [xfs]
>   [ 1895.113500]  [] xfs_attr_rmtval_set+0x2c9/0x450 
> [xfs]
>   [ 1895.114235]  [] xfs_attr_node_addname+0x4c9/0x5d0 
> [xfs]
>   [ 1895.114948]  [] xfs_attr_set+0x3dc/0x460 [xfs]
>   [ 1895.115593]  [] xfs_xattr_set+0x4f/0x90 [xfs]
>   [ 1895.116221]  [] generic_setxattr+0x70/0x80
>   [ 1895.116798]  [] __vfs_setxattr_noperm+0xaf/0x1a0
>   [ 1895.117445]  [] ? security_inode_setxattr+0x65/0xb0
>   [ 1895.118121]  [] vfs_setxattr+0xa7/0xb0
>   [ 

Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-04 Thread Oleg Nesterov
On 10/04, Oleg Nesterov wrote:
>
> This time it hangs after generic/274:
>
>   --- tests/generic/274.out   2016-10-04 04:23:24.209006171 -0400
>   +++ /root/XFS/xfstests-dev/results//generic/274.out.bad 2016-10-04 
> 05:17:49.291742498 -0400
>   @@ -3,3 +3,4 @@
>   preallocation test
>   --
>   done
>   +./common/rc: line 344:  4693 Segmentation fault  exit
>   ...
>   (Run 'diff -u tests/generic/274.out 
> /root/XFS/xfstests-dev/results//generic/274.out.bad'  to see the entire diff)
>
> because of kernel bug:
>
>   [ 2730.242537] run fstests generic/274 at 2016-10-04 05:17:34
>   [ 2730.738352] XFS (loop1): Mounting V5 Filesystem
>   [ 2730.741451] XFS (loop1): Ending clean mount
>   [ 2744.508698] [ cut here ]
>   [ 2744.509190] kernel BUG at ./include/linux/swap.h:276!

I removed this test and then the next run (after reboot) hangs at xfs/073 with
a lot of errors in dmesg like

XFS (loop2): Failing async write on buffer block 0x9600790. Retrying 
async write.
blk_update_request: I/O error, dev loop2, sector 8389920
loop: Write error at byte offset 4295647232, length 4096.

Oleg.



Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-04 Thread Oleg Nesterov
On 10/04, Oleg Nesterov wrote:
>
> This time it hangs after generic/274:
>
>   --- tests/generic/274.out   2016-10-04 04:23:24.209006171 -0400
>   +++ /root/XFS/xfstests-dev/results//generic/274.out.bad 2016-10-04 
> 05:17:49.291742498 -0400
>   @@ -3,3 +3,4 @@
>   preallocation test
>   --
>   done
>   +./common/rc: line 344:  4693 Segmentation fault  exit
>   ...
>   (Run 'diff -u tests/generic/274.out 
> /root/XFS/xfstests-dev/results//generic/274.out.bad'  to see the entire diff)
>
> because of kernel bug:
>
>   [ 2730.242537] run fstests generic/274 at 2016-10-04 05:17:34
>   [ 2730.738352] XFS (loop1): Mounting V5 Filesystem
>   [ 2730.741451] XFS (loop1): Ending clean mount
>   [ 2744.508698] [ cut here ]
>   [ 2744.509190] kernel BUG at ./include/linux/swap.h:276!

I removed this test and then the next run (after reboot) hangs at xfs/073 with
a lot of errors in dmesg like

XFS (loop2): Failing async write on buffer block 0x9600790. Retrying 
async write.
blk_update_request: I/O error, dev loop2, sector 8389920
loop: Write error at byte offset 4295647232, length 4096.

Oleg.



Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-03 Thread Oleg Nesterov
On 10/03, Dave Chinner wrote:
>
> On Fri, Sep 30, 2016 at 07:14:34PM +0200, Oleg Nesterov wrote:
> > On 09/27, Oleg Nesterov wrote:
> > >
> > >
> > > It seems that generic/001 just hangs on my laptop. With or without this 
> > > change.
> > > Or perhaps I didn't wait enough...
> >
> > /usr/bin/awk spins in user-mode forever, according to strace it doesn't do
> > any syscalls. I didn't even try to investigate.
>
> Are you using gawk, or some other version of awk? gawk doesn't have
> any problems

gawk, awk is symlink. And somehow this depends on userspace environment,
it doesn't hang on the full-blown Fedora 23.

> > Jan, I gave up.
> >
> > Whatever I did xfstests-dev/check reports a lot of failures, kernel bugs,
> > and finally either crashes the kernel or hangs.
>
> Did you run the auto group like I suggested? That's the set of tests
> that should complete successfully with minimal failures and without
> crashing the machine.

OK, I'll reserve a testing machine again.

Oleg.



Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-03 Thread Oleg Nesterov
On 10/03, Dave Chinner wrote:
>
> On Fri, Sep 30, 2016 at 07:14:34PM +0200, Oleg Nesterov wrote:
> > On 09/27, Oleg Nesterov wrote:
> > >
> > >
> > > It seems that generic/001 just hangs on my laptop. With or without this 
> > > change.
> > > Or perhaps I didn't wait enough...
> >
> > /usr/bin/awk spins in user-mode forever, according to strace it doesn't do
> > any syscalls. I didn't even try to investigate.
>
> Are you using gawk, or some other version of awk? gawk doesn't have
> any problems

gawk, awk is symlink. And somehow this depends on userspace environment,
it doesn't hang on the full-blown Fedora 23.

> > Jan, I gave up.
> >
> > Whatever I did xfstests-dev/check reports a lot of failures, kernel bugs,
> > and finally either crashes the kernel or hangs.
>
> Did you run the auto group like I suggested? That's the set of tests
> that should complete successfully with minimal failures and without
> crashing the machine.

OK, I'll reserve a testing machine again.

Oleg.



Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-02 Thread Dave Chinner
On Fri, Sep 30, 2016 at 07:14:34PM +0200, Oleg Nesterov wrote:
> On 09/27, Oleg Nesterov wrote:
> >
> > On 09/27, Jan Kara wrote:
> > >
> > > You can run either:
> > >
> > >   ./check -g freeze
> >
> > passed all 6 tests.
> >
> > > to check just the freezing tests or
> > >
> > >   ./check
> > >
> > > to run all sensible tests which is what I'd do (but it will take couple of
> > > hours to pass). If that passes, chances are good there are no easy false
> > > positives.
> >
> > It seems that generic/001 just hangs on my laptop. With or without this 
> > change.
> > Or perhaps I didn't wait enough...
> 
> /usr/bin/awk spins in user-mode forever, according to strace it doesn't do
> any syscalls. I didn't even try to investigate.

Are you using gawk, or some other version of awk? gawk doesn't have
any problems (it is listed as a prereq package in the README file)
that anyone has reported, and there are a lot of people using it...

> > Or perhaps something is wrong with my very
> > limited testing environment. I'll reserve a testing machine tomorrow.
> 
> Jan, I gave up.
> 
> Whatever I did xfstests-dev/check reports a lot of failures, kernel bugs,
> and finally either crashes the kernel or hangs.

Did you run the auto group like I suggested? That's the set of tests
that should complete successfully with minimal failures and without
crashing the machine. If you're running this group and there's
failures, hangs and crashes all over the place, then you need to
start reporting bugs because that should not be happening on any
kernel

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-10-02 Thread Dave Chinner
On Fri, Sep 30, 2016 at 07:14:34PM +0200, Oleg Nesterov wrote:
> On 09/27, Oleg Nesterov wrote:
> >
> > On 09/27, Jan Kara wrote:
> > >
> > > You can run either:
> > >
> > >   ./check -g freeze
> >
> > passed all 6 tests.
> >
> > > to check just the freezing tests or
> > >
> > >   ./check
> > >
> > > to run all sensible tests which is what I'd do (but it will take couple of
> > > hours to pass). If that passes, chances are good there are no easy false
> > > positives.
> >
> > It seems that generic/001 just hangs on my laptop. With or without this 
> > change.
> > Or perhaps I didn't wait enough...
> 
> /usr/bin/awk spins in user-mode forever, according to strace it doesn't do
> any syscalls. I didn't even try to investigate.

Are you using gawk, or some other version of awk? gawk doesn't have
any problems (it is listed as a prereq package in the README file)
that anyone has reported, and there are a lot of people using it...

> > Or perhaps something is wrong with my very
> > limited testing environment. I'll reserve a testing machine tomorrow.
> 
> Jan, I gave up.
> 
> Whatever I did xfstests-dev/check reports a lot of failures, kernel bugs,
> and finally either crashes the kernel or hangs.

Did you run the auto group like I suggested? That's the set of tests
that should complete successfully with minimal failures and without
crashing the machine. If you're running this group and there's
failures, hangs and crashes all over the place, then you need to
start reporting bugs because that should not be happening on any
kernel

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-09-30 Thread Oleg Nesterov
On 09/27, Oleg Nesterov wrote:
>
> On 09/27, Jan Kara wrote:
> >
> > You can run either:
> >
> > ./check -g freeze
>
> passed all 6 tests.
>
> > to check just the freezing tests or
> >
> > ./check
> >
> > to run all sensible tests which is what I'd do (but it will take couple of
> > hours to pass). If that passes, chances are good there are no easy false
> > positives.
>
> It seems that generic/001 just hangs on my laptop. With or without this 
> change.
> Or perhaps I didn't wait enough...

/usr/bin/awk spins in user-mode forever, according to strace it doesn't do
any syscalls. I didn't even try to investigate.

> Or perhaps something is wrong with my very
> limited testing environment. I'll reserve a testing machine tomorrow.

Jan, I gave up.

Whatever I did xfstests-dev/check reports a lot of failures, kernel bugs,
and finally either crashes the kernel or hangs.

With or without this change, and I didn't notice any warning from lockdep.
So I hope we can apply this patch. At least 1/2 which fixes a bug.

Oleg.



Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-09-30 Thread Oleg Nesterov
On 09/27, Oleg Nesterov wrote:
>
> On 09/27, Jan Kara wrote:
> >
> > You can run either:
> >
> > ./check -g freeze
>
> passed all 6 tests.
>
> > to check just the freezing tests or
> >
> > ./check
> >
> > to run all sensible tests which is what I'd do (but it will take couple of
> > hours to pass). If that passes, chances are good there are no easy false
> > positives.
>
> It seems that generic/001 just hangs on my laptop. With or without this 
> change.
> Or perhaps I didn't wait enough...

/usr/bin/awk spins in user-mode forever, according to strace it doesn't do
any syscalls. I didn't even try to investigate.

> Or perhaps something is wrong with my very
> limited testing environment. I'll reserve a testing machine tomorrow.

Jan, I gave up.

Whatever I did xfstests-dev/check reports a lot of failures, kernel bugs,
and finally either crashes the kernel or hangs.

With or without this change, and I didn't notice any warning from lockdep.
So I hope we can apply this patch. At least 1/2 which fixes a bug.

Oleg.



Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-09-27 Thread Oleg Nesterov
On 09/27, Jan Kara wrote:
>
> On Mon 26-09-16 18:55:25, Oleg Nesterov wrote:
> >
> > Heh ;) if only I knew how to test this... I ran the following script
> > under qemu
> >
> > mkfs.xfs -f /dev/vda
> > mkfs.xfs -f /dev/vdb
> >
> > mkdir -p TEST SCRATCH
> >
> > TEST_DEV=/dev/vda TEST_DIR=TEST SCRATCH_DEV=/dev/vdb 
> > SCRATCH_MNT=SCRATCH \
> > ./check `grep -il freeze tests/*/???`
>
> You can run either:
>
>   ./check -g freeze

passed all 6 tests.

> to check just the freezing tests or
>
>   ./check
>
> to run all sensible tests which is what I'd do (but it will take couple of
> hours to pass). If that passes, chances are good there are no easy false
> positives.

It seems that generic/001 just hangs on my laptop. With or without this change.
Or perhaps I didn't wait enough... Or perhaps something is wrong with my very
limited testing environment. I'll reserve a testing machine tomorrow.

> > And yes, I'm afraid this change can uncover some false positives later.
> > But at the same time potentially it can find the real problems.
>
> Well, sure it's not an end of world if there is some false positive - we
> can just revert the change - but lockdep false positives are always
> annoying because they take time to analyze and until they are fixed, you
> are unable to see other probles found by lockdep...

Yes, yes, agreed.

> > It would be nice to remove another hack in __sb_start_write under
> > ifdef(CONFIG_LOCKDEP), but iirc XFS actually takes the same rw_sem twice
> > for reading, so we can't do this.
>
> Yes, and I don't really consider this a hack.

Ah, sorry, I didn't try to blame XFS/fs. I meant, this "force_trylock" hack
doesn't look nice. Perhaps we can use rwsem_acquire_nest() instead.

> Reviewed-by: Jan Kara 

Thanks!

Oleg.



Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-09-27 Thread Oleg Nesterov
On 09/27, Jan Kara wrote:
>
> On Mon 26-09-16 18:55:25, Oleg Nesterov wrote:
> >
> > Heh ;) if only I knew how to test this... I ran the following script
> > under qemu
> >
> > mkfs.xfs -f /dev/vda
> > mkfs.xfs -f /dev/vdb
> >
> > mkdir -p TEST SCRATCH
> >
> > TEST_DEV=/dev/vda TEST_DIR=TEST SCRATCH_DEV=/dev/vdb 
> > SCRATCH_MNT=SCRATCH \
> > ./check `grep -il freeze tests/*/???`
>
> You can run either:
>
>   ./check -g freeze

passed all 6 tests.

> to check just the freezing tests or
>
>   ./check
>
> to run all sensible tests which is what I'd do (but it will take couple of
> hours to pass). If that passes, chances are good there are no easy false
> positives.

It seems that generic/001 just hangs on my laptop. With or without this change.
Or perhaps I didn't wait enough... Or perhaps something is wrong with my very
limited testing environment. I'll reserve a testing machine tomorrow.

> > And yes, I'm afraid this change can uncover some false positives later.
> > But at the same time potentially it can find the real problems.
>
> Well, sure it's not an end of world if there is some false positive - we
> can just revert the change - but lockdep false positives are always
> annoying because they take time to analyze and until they are fixed, you
> are unable to see other probles found by lockdep...

Yes, yes, agreed.

> > It would be nice to remove another hack in __sb_start_write under
> > ifdef(CONFIG_LOCKDEP), but iirc XFS actually takes the same rw_sem twice
> > for reading, so we can't do this.
>
> Yes, and I don't really consider this a hack.

Ah, sorry, I didn't try to blame XFS/fs. I meant, this "force_trylock" hack
doesn't look nice. Perhaps we can use rwsem_acquire_nest() instead.

> Reviewed-by: Jan Kara 

Thanks!

Oleg.



Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-09-27 Thread Dave Chinner
On Tue, Sep 27, 2016 at 08:51:35AM +0200, Jan Kara wrote:
> On Mon 26-09-16 18:55:25, Oleg Nesterov wrote:
> > On 09/26, Jan Kara wrote:
> > >
> > > On Mon 26-09-16 18:08:06, Oleg Nesterov wrote:
> > > > +/*
> > > > + * Tell lockdep we are holding these locks before we call 
> > > > ->unfreeze_fs(sb).
> > > > + */
> > > > +static void sb_freeze_acquire(struct super_block *sb)
> > >
> > > Can we call this lockdep_sb_freeze_acquire() or something like that so 
> > > that
> > > it is clear this is only about lockdep annotations? Similarly with
> > > sb_freeze_unlock()...
> > 
> > OK, thanks, done. See V2 below.
> > 
> > > and I hope you really tested
> > > there are no more lockdep false positives ;).
> > 
> > Heh ;) if only I knew how to test this... I ran the following script
> > under qemu
> > 
> > mkfs.xfs -f /dev/vda
> > mkfs.xfs -f /dev/vdb
> > 
> > mkdir -p TEST SCRATCH
> > 
> > TEST_DEV=/dev/vda TEST_DIR=TEST SCRATCH_DEV=/dev/vdb 
> > SCRATCH_MNT=SCRATCH \
> > ./check `grep -il freeze tests/*/???`
> 
> You can run either:
> 
>   ./check -g freeze
> 
> to check just the freezing tests or
> 
>   ./check

Better for regression testing is:

check -g auto

so that is skips all the tests that are broken or likely to crash
the machine on some debug check.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-09-27 Thread Dave Chinner
On Tue, Sep 27, 2016 at 08:51:35AM +0200, Jan Kara wrote:
> On Mon 26-09-16 18:55:25, Oleg Nesterov wrote:
> > On 09/26, Jan Kara wrote:
> > >
> > > On Mon 26-09-16 18:08:06, Oleg Nesterov wrote:
> > > > +/*
> > > > + * Tell lockdep we are holding these locks before we call 
> > > > ->unfreeze_fs(sb).
> > > > + */
> > > > +static void sb_freeze_acquire(struct super_block *sb)
> > >
> > > Can we call this lockdep_sb_freeze_acquire() or something like that so 
> > > that
> > > it is clear this is only about lockdep annotations? Similarly with
> > > sb_freeze_unlock()...
> > 
> > OK, thanks, done. See V2 below.
> > 
> > > and I hope you really tested
> > > there are no more lockdep false positives ;).
> > 
> > Heh ;) if only I knew how to test this... I ran the following script
> > under qemu
> > 
> > mkfs.xfs -f /dev/vda
> > mkfs.xfs -f /dev/vdb
> > 
> > mkdir -p TEST SCRATCH
> > 
> > TEST_DEV=/dev/vda TEST_DIR=TEST SCRATCH_DEV=/dev/vdb 
> > SCRATCH_MNT=SCRATCH \
> > ./check `grep -il freeze tests/*/???`
> 
> You can run either:
> 
>   ./check -g freeze
> 
> to check just the freezing tests or
> 
>   ./check

Better for regression testing is:

check -g auto

so that is skips all the tests that are broken or likely to crash
the machine on some debug check.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-09-27 Thread Jan Kara
On Mon 26-09-16 18:55:25, Oleg Nesterov wrote:
> On 09/26, Jan Kara wrote:
> >
> > On Mon 26-09-16 18:08:06, Oleg Nesterov wrote:
> > > +/*
> > > + * Tell lockdep we are holding these locks before we call 
> > > ->unfreeze_fs(sb).
> > > + */
> > > +static void sb_freeze_acquire(struct super_block *sb)
> >
> > Can we call this lockdep_sb_freeze_acquire() or something like that so that
> > it is clear this is only about lockdep annotations? Similarly with
> > sb_freeze_unlock()...
> 
> OK, thanks, done. See V2 below.
> 
> > and I hope you really tested
> > there are no more lockdep false positives ;).
> 
> Heh ;) if only I knew how to test this... I ran the following script
> under qemu
> 
>   mkfs.xfs -f /dev/vda
>   mkfs.xfs -f /dev/vdb
> 
>   mkdir -p TEST SCRATCH
> 
>   TEST_DEV=/dev/vda TEST_DIR=TEST SCRATCH_DEV=/dev/vdb 
> SCRATCH_MNT=SCRATCH \
>   ./check `grep -il freeze tests/*/???`

You can run either:

./check -g freeze

to check just the freezing tests or

./check

to run all sensible tests which is what I'd do (but it will take couple of
hours to pass). If that passes, chances are good there are no easy false
positives.

> And yes, I'm afraid this change can uncover some false positives later.
> But at the same time potentially it can find the real problems.

Well, sure it's not an end of world if there is some false positive - we
can just revert the change - but lockdep false positives are always
annoying because they take time to analyze and until they are fixed, you
are unable to see other probles found by lockdep...

> It would be nice to remove another hack in __sb_start_write under
> ifdef(CONFIG_LOCKDEP), but iirc XFS actually takes the same rw_sem twice
> for reading, so we can't do this.

Yes, and I don't really consider this a hack. Filesystem freezing was never
meant to have all the properties of classical rwsems in kernel - I just
happened to notice that the semantics is close to rwsems and so decided to
model it as such in lockdep to get some lockdep verification for it. In
particular "read lock recursion" has always been OK with freeze protection
by design and I don't see strong motivation for changing that.

> ---
> From 70e774533ab6319f9b90a490b036150ad9604af7 Mon Sep 17 00:00:00 2001
> From: Oleg Nesterov <o...@redhat.com>
> Date: Mon, 26 Sep 2016 17:23:24 +0200
> Subject: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and 
> thaw_super() paths
> 
> sb_wait_write()->percpu_rwsem_release() fools lockdep to avoid the
> false-positives. Now that xfs was fixed by Dave's commit dbad7c993053
> ("xfs: stop holding ILOCK over filldir callbacks") we can remove it and
> change freeze_super() and thaw_super() to run with s_writers.rw_sem locks
> held; we add two trivial helpers for that, lockdep_sb_freeze_release()
> and lockdep_sb_freeze_acquire().
> 
> xfstests-dev/check `grep -il freeze tests/*/???` does not trigger any
> warning from lockdep.
> 
> Signed-off-by: Oleg Nesterov <o...@redhat.com>

Thanks. The patch looks good. You can add:

Reviewed-by: Jan Kara <j...@suse.cz>

Honza
> ---
>  fs/super.c | 37 +
>  1 file changed, 25 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/super.c b/fs/super.c
> index 2549896c..0447afe 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1214,25 +1214,34 @@ EXPORT_SYMBOL(__sb_start_write);
>  static void sb_wait_write(struct super_block *sb, int level)
>  {
>   percpu_down_write(sb->s_writers.rw_sem + level-1);
> - /*
> -  * We are going to return to userspace and forget about this lock, the
> -  * ownership goes to the caller of thaw_super() which does unlock.
> -  *
> -  * FIXME: we should do this before return from freeze_super() after we
> -  * called sync_filesystem(sb) and s_op->freeze_fs(sb), and thaw_super()
> -  * should re-acquire these locks before s_op->unfreeze_fs(sb). However
> -  * this leads to lockdep false-positives, so currently we do the early
> -  * release right after acquire.
> -  */
> - percpu_rwsem_release(sb->s_writers.rw_sem + level-1, 0, _THIS_IP_);
>  }
>  
> -static void sb_freeze_unlock(struct super_block *sb)
> +/*
> + * We are going to return to userspace and forget about these locks, the
> + * ownership goes to the caller of thaw_super() which does unlock().
> + */
> +static void lockdep_sb_freeze_release(struct super_block *sb)
> +{
> + int level;
> +
> + for (level = 

Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-09-27 Thread Jan Kara
On Mon 26-09-16 18:55:25, Oleg Nesterov wrote:
> On 09/26, Jan Kara wrote:
> >
> > On Mon 26-09-16 18:08:06, Oleg Nesterov wrote:
> > > +/*
> > > + * Tell lockdep we are holding these locks before we call 
> > > ->unfreeze_fs(sb).
> > > + */
> > > +static void sb_freeze_acquire(struct super_block *sb)
> >
> > Can we call this lockdep_sb_freeze_acquire() or something like that so that
> > it is clear this is only about lockdep annotations? Similarly with
> > sb_freeze_unlock()...
> 
> OK, thanks, done. See V2 below.
> 
> > and I hope you really tested
> > there are no more lockdep false positives ;).
> 
> Heh ;) if only I knew how to test this... I ran the following script
> under qemu
> 
>   mkfs.xfs -f /dev/vda
>   mkfs.xfs -f /dev/vdb
> 
>   mkdir -p TEST SCRATCH
> 
>   TEST_DEV=/dev/vda TEST_DIR=TEST SCRATCH_DEV=/dev/vdb 
> SCRATCH_MNT=SCRATCH \
>   ./check `grep -il freeze tests/*/???`

You can run either:

./check -g freeze

to check just the freezing tests or

./check

to run all sensible tests which is what I'd do (but it will take couple of
hours to pass). If that passes, chances are good there are no easy false
positives.

> And yes, I'm afraid this change can uncover some false positives later.
> But at the same time potentially it can find the real problems.

Well, sure it's not an end of world if there is some false positive - we
can just revert the change - but lockdep false positives are always
annoying because they take time to analyze and until they are fixed, you
are unable to see other probles found by lockdep...

> It would be nice to remove another hack in __sb_start_write under
> ifdef(CONFIG_LOCKDEP), but iirc XFS actually takes the same rw_sem twice
> for reading, so we can't do this.

Yes, and I don't really consider this a hack. Filesystem freezing was never
meant to have all the properties of classical rwsems in kernel - I just
happened to notice that the semantics is close to rwsems and so decided to
model it as such in lockdep to get some lockdep verification for it. In
particular "read lock recursion" has always been OK with freeze protection
by design and I don't see strong motivation for changing that.

> ---------------
> From 70e774533ab6319f9b90a490b036150ad9604af7 Mon Sep 17 00:00:00 2001
> From: Oleg Nesterov 
> Date: Mon, 26 Sep 2016 17:23:24 +0200
> Subject: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and 
> thaw_super() paths
> 
> sb_wait_write()->percpu_rwsem_release() fools lockdep to avoid the
> false-positives. Now that xfs was fixed by Dave's commit dbad7c993053
> ("xfs: stop holding ILOCK over filldir callbacks") we can remove it and
> change freeze_super() and thaw_super() to run with s_writers.rw_sem locks
> held; we add two trivial helpers for that, lockdep_sb_freeze_release()
> and lockdep_sb_freeze_acquire().
> 
> xfstests-dev/check `grep -il freeze tests/*/???` does not trigger any
> warning from lockdep.
> 
> Signed-off-by: Oleg Nesterov 

Thanks. The patch looks good. You can add:

Reviewed-by: Jan Kara 

Honza
> ---
>  fs/super.c | 37 +
>  1 file changed, 25 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/super.c b/fs/super.c
> index 2549896c..0447afe 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1214,25 +1214,34 @@ EXPORT_SYMBOL(__sb_start_write);
>  static void sb_wait_write(struct super_block *sb, int level)
>  {
>   percpu_down_write(sb->s_writers.rw_sem + level-1);
> - /*
> -  * We are going to return to userspace and forget about this lock, the
> -  * ownership goes to the caller of thaw_super() which does unlock.
> -  *
> -  * FIXME: we should do this before return from freeze_super() after we
> -  * called sync_filesystem(sb) and s_op->freeze_fs(sb), and thaw_super()
> -  * should re-acquire these locks before s_op->unfreeze_fs(sb). However
> -  * this leads to lockdep false-positives, so currently we do the early
> -  * release right after acquire.
> -  */
> - percpu_rwsem_release(sb->s_writers.rw_sem + level-1, 0, _THIS_IP_);
>  }
>  
> -static void sb_freeze_unlock(struct super_block *sb)
> +/*
> + * We are going to return to userspace and forget about these locks, the
> + * ownership goes to the caller of thaw_super() which does unlock().
> + */
> +static void lockdep_sb_freeze_release(struct super_block *sb)
> +{
> + int level;
> +
> + for (level = SB_FREEZE_LEVELS - 1; level >= 0; level--)
> + 

[PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-09-26 Thread Oleg Nesterov
On 09/26, Jan Kara wrote:
>
> On Mon 26-09-16 18:08:06, Oleg Nesterov wrote:
> > +/*
> > + * Tell lockdep we are holding these locks before we call 
> > ->unfreeze_fs(sb).
> > + */
> > +static void sb_freeze_acquire(struct super_block *sb)
>
> Can we call this lockdep_sb_freeze_acquire() or something like that so that
> it is clear this is only about lockdep annotations? Similarly with
> sb_freeze_unlock()...

OK, thanks, done. See V2 below.

> and I hope you really tested
> there are no more lockdep false positives ;).

Heh ;) if only I knew how to test this... I ran the following script under qemu

mkfs.xfs -f /dev/vda
mkfs.xfs -f /dev/vdb

mkdir -p TEST SCRATCH

TEST_DEV=/dev/vda TEST_DIR=TEST SCRATCH_DEV=/dev/vdb 
SCRATCH_MNT=SCRATCH \
./check `grep -il freeze tests/*/???`

this time.

And yes, I'm afraid this change can uncover some false positives later. But at
the same time potentially it can find the real problems.

It would be nice to remove another hack in __sb_start_write under 
ifdef(CONFIG_LOCKDEP),
but iirc XFS actually takes the same rw_sem twice for reading, so we can't do 
this.

---
>From 70e774533ab6319f9b90a490b036150ad9604af7 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <o...@redhat.com>
Date: Mon, 26 Sep 2016 17:23:24 +0200
Subject: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and 
thaw_super() paths

sb_wait_write()->percpu_rwsem_release() fools lockdep to avoid the
false-positives. Now that xfs was fixed by Dave's commit dbad7c993053
("xfs: stop holding ILOCK over filldir callbacks") we can remove it and
change freeze_super() and thaw_super() to run with s_writers.rw_sem locks
held; we add two trivial helpers for that, lockdep_sb_freeze_release()
and lockdep_sb_freeze_acquire().

xfstests-dev/check `grep -il freeze tests/*/???` does not trigger any
warning from lockdep.

Signed-off-by: Oleg Nesterov <o...@redhat.com>
---
 fs/super.c | 37 +
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 2549896c..0447afe 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1214,25 +1214,34 @@ EXPORT_SYMBOL(__sb_start_write);
 static void sb_wait_write(struct super_block *sb, int level)
 {
percpu_down_write(sb->s_writers.rw_sem + level-1);
-   /*
-* We are going to return to userspace and forget about this lock, the
-* ownership goes to the caller of thaw_super() which does unlock.
-*
-* FIXME: we should do this before return from freeze_super() after we
-* called sync_filesystem(sb) and s_op->freeze_fs(sb), and thaw_super()
-* should re-acquire these locks before s_op->unfreeze_fs(sb). However
-* this leads to lockdep false-positives, so currently we do the early
-* release right after acquire.
-*/
-   percpu_rwsem_release(sb->s_writers.rw_sem + level-1, 0, _THIS_IP_);
 }
 
-static void sb_freeze_unlock(struct super_block *sb)
+/*
+ * We are going to return to userspace and forget about these locks, the
+ * ownership goes to the caller of thaw_super() which does unlock().
+ */
+static void lockdep_sb_freeze_release(struct super_block *sb)
+{
+   int level;
+
+   for (level = SB_FREEZE_LEVELS - 1; level >= 0; level--)
+   percpu_rwsem_release(sb->s_writers.rw_sem + level, 0, 
_THIS_IP_);
+}
+
+/*
+ * Tell lockdep we are holding these locks before we call ->unfreeze_fs(sb).
+ */
+static void lockdep_sb_freeze_acquire(struct super_block *sb)
 {
int level;
 
for (level = 0; level < SB_FREEZE_LEVELS; ++level)
percpu_rwsem_acquire(sb->s_writers.rw_sem + level, 0, 
_THIS_IP_);
+}
+
+static void sb_freeze_unlock(struct super_block *sb)
+{
+   int level;
 
for (level = SB_FREEZE_LEVELS - 1; level >= 0; level--)
percpu_up_write(sb->s_writers.rw_sem + level);
@@ -1328,6 +1337,7 @@ int freeze_super(struct super_block *sb)
 * when frozen is set to SB_FREEZE_COMPLETE, and for thaw_super().
 */
sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+   lockdep_sb_freeze_release(sb);
up_write(>s_umount);
return 0;
 }
@@ -1354,11 +1364,14 @@ int thaw_super(struct super_block *sb)
goto out;
}
 
+   lockdep_sb_freeze_acquire(sb);
+
if (sb->s_op->unfreeze_fs) {
error = sb->s_op->unfreeze_fs(sb);
if (error) {
printk(KERN_ERR
"VFS:Filesystem thaw failed\n");
+   lockdep_sb_freeze_release(sb);
up_write(>s_umount);
return error;
}
-- 
2.5.0




[PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths

2016-09-26 Thread Oleg Nesterov
On 09/26, Jan Kara wrote:
>
> On Mon 26-09-16 18:08:06, Oleg Nesterov wrote:
> > +/*
> > + * Tell lockdep we are holding these locks before we call 
> > ->unfreeze_fs(sb).
> > + */
> > +static void sb_freeze_acquire(struct super_block *sb)
>
> Can we call this lockdep_sb_freeze_acquire() or something like that so that
> it is clear this is only about lockdep annotations? Similarly with
> sb_freeze_unlock()...

OK, thanks, done. See V2 below.

> and I hope you really tested
> there are no more lockdep false positives ;).

Heh ;) if only I knew how to test this... I ran the following script under qemu

mkfs.xfs -f /dev/vda
mkfs.xfs -f /dev/vdb

mkdir -p TEST SCRATCH

TEST_DEV=/dev/vda TEST_DIR=TEST SCRATCH_DEV=/dev/vdb 
SCRATCH_MNT=SCRATCH \
./check `grep -il freeze tests/*/???`

this time.

And yes, I'm afraid this change can uncover some false positives later. But at
the same time potentially it can find the real problems.

It would be nice to remove another hack in __sb_start_write under 
ifdef(CONFIG_LOCKDEP),
but iirc XFS actually takes the same rw_sem twice for reading, so we can't do 
this.

---
>From 70e774533ab6319f9b90a490b036150ad9604af7 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov 
Date: Mon, 26 Sep 2016 17:23:24 +0200
Subject: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and 
thaw_super() paths

sb_wait_write()->percpu_rwsem_release() fools lockdep to avoid the
false-positives. Now that xfs was fixed by Dave's commit dbad7c993053
("xfs: stop holding ILOCK over filldir callbacks") we can remove it and
change freeze_super() and thaw_super() to run with s_writers.rw_sem locks
held; we add two trivial helpers for that, lockdep_sb_freeze_release()
and lockdep_sb_freeze_acquire().

xfstests-dev/check `grep -il freeze tests/*/???` does not trigger any
warning from lockdep.

Signed-off-by: Oleg Nesterov 
---
 fs/super.c | 37 +
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 2549896c..0447afe 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1214,25 +1214,34 @@ EXPORT_SYMBOL(__sb_start_write);
 static void sb_wait_write(struct super_block *sb, int level)
 {
percpu_down_write(sb->s_writers.rw_sem + level-1);
-   /*
-* We are going to return to userspace and forget about this lock, the
-* ownership goes to the caller of thaw_super() which does unlock.
-*
-* FIXME: we should do this before return from freeze_super() after we
-* called sync_filesystem(sb) and s_op->freeze_fs(sb), and thaw_super()
-* should re-acquire these locks before s_op->unfreeze_fs(sb). However
-* this leads to lockdep false-positives, so currently we do the early
-* release right after acquire.
-*/
-   percpu_rwsem_release(sb->s_writers.rw_sem + level-1, 0, _THIS_IP_);
 }
 
-static void sb_freeze_unlock(struct super_block *sb)
+/*
+ * We are going to return to userspace and forget about these locks, the
+ * ownership goes to the caller of thaw_super() which does unlock().
+ */
+static void lockdep_sb_freeze_release(struct super_block *sb)
+{
+   int level;
+
+   for (level = SB_FREEZE_LEVELS - 1; level >= 0; level--)
+   percpu_rwsem_release(sb->s_writers.rw_sem + level, 0, 
_THIS_IP_);
+}
+
+/*
+ * Tell lockdep we are holding these locks before we call ->unfreeze_fs(sb).
+ */
+static void lockdep_sb_freeze_acquire(struct super_block *sb)
 {
int level;
 
for (level = 0; level < SB_FREEZE_LEVELS; ++level)
percpu_rwsem_acquire(sb->s_writers.rw_sem + level, 0, 
_THIS_IP_);
+}
+
+static void sb_freeze_unlock(struct super_block *sb)
+{
+   int level;
 
for (level = SB_FREEZE_LEVELS - 1; level >= 0; level--)
percpu_up_write(sb->s_writers.rw_sem + level);
@@ -1328,6 +1337,7 @@ int freeze_super(struct super_block *sb)
 * when frozen is set to SB_FREEZE_COMPLETE, and for thaw_super().
 */
sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+   lockdep_sb_freeze_release(sb);
up_write(>s_umount);
return 0;
 }
@@ -1354,11 +1364,14 @@ int thaw_super(struct super_block *sb)
goto out;
}
 
+   lockdep_sb_freeze_acquire(sb);
+
if (sb->s_op->unfreeze_fs) {
error = sb->s_op->unfreeze_fs(sb);
if (error) {
printk(KERN_ERR
"VFS:Filesystem thaw failed\n");
+   lockdep_sb_freeze_release(sb);
up_write(>s_umount);
return error;
}
-- 
2.5.0