[Cluster-devel] [GIT PULL] gfs2: 4.19 fix

2018-10-09 Thread Andreas Gruenbacher
Hi Greg,

could you please pull the following gfs2 fix for 4.19? This fixes a
regression introduced in commit 64bc06bb32ee "gfs2: iomap buffered
write support" (4.19-rc1).

Thanks,
Andreas

--

The following changes since commit 0854ba5ff5c938307cd783e996b62c83f1ce923b:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
(2018-10-08 16:25:01 +0200)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git
gfs2-4.19.fixes2

for you to fetch changes up to dc480feb454a975b7ee2c18a2f98fb34e04d3baf:

  gfs2: Fix iomap buffered write support for journaled files
(2018-10-09 18:20:13 +0200)


Andreas Gruenbacher (1):
  gfs2: Fix iomap buffered write support for journaled files

 fs/gfs2/bmap.c | 4 
 1 file changed, 4 insertions(+)



[Cluster-devel] [PATCH] gfs2: Fix iomap buffered write support for journaled files

2018-10-09 Thread Andreas Gruenbacher
Commit 64bc06bb32ee broke buffered writes to journaled files (chattr
+j): we'll try to journal the buffer heads of the page being written to
in gfs2_iomap_journaled_page_done.  However, the iomap code no longer
creates buffer heads, so we'll BUG() in gfs2_page_add_databufs.  Fix
that by creating buffer heads ourself when needed.

Signed-off-by: Andreas Gruenbacher 
---
 fs/gfs2/bmap.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 03128ed1f34e..3c159a7f9a9e 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -975,6 +975,10 @@ static void gfs2_iomap_journaled_page_done(struct inode 
*inode, loff_t pos,
 {
struct gfs2_inode *ip = GFS2_I(inode);
 
+   if (!page_has_buffers(page)) {
+   create_empty_buffers(page, inode->i_sb->s_blocksize,
+(1 << BH_Dirty)|(1 << BH_Uptodate));
+   }
gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
 }
 
-- 
2.17.1



Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock

2018-10-09 Thread Tim Smith
On Tuesday, 9 October 2018 15:47:21 BST Andreas Gruenbacher wrote:
> Mark and Tim,
> 
> does the following patch fix the problem, perhaps?

The assertion being that there are several waiters. Certainly possible.

We'll give it a try. It takes ~12 hours to hit one instance of this so we've 
got plenty of thinking time between runs.

> Thanks,
> Andreas
> 
> ---
>  fs/gfs2/glock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 4614ee25f621..71e7c380d4c4 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -121,7 +121,7 @@ static void wake_up_glock(struct gfs2_glock *gl)
>   wait_queue_head_t *wq = glock_waitqueue(&gl->gl_name);
> 
>   if (waitqueue_active(wq))
> - __wake_up(wq, TASK_NORMAL, 1, &gl->gl_name);
> + __wake_up(wq, TASK_NORMAL, 0, &gl->gl_name);
>  }
> 
>  static void gfs2_glock_dealloc(struct rcu_head *rcu)


-- 
Tim Smith 




Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock

2018-10-09 Thread Tim Smith
On Tuesday, 9 October 2018 14:00:34 BST Andreas Gruenbacher wrote:
> On Tue, 9 Oct 2018 at 14:46, Tim Smith  wrote:
> > On Tuesday, 9 October 2018 13:34:47 BST you wrote:
> > > On Mon, 8 Oct 2018 at 15:27, Tim Smith  wrote:
> > > > On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote:
> > > > > Hi,
> > > > > 
> > > > > On 08/10/18 14:10, Tim Smith wrote:
> > > > > > On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote:
> > > > > >> On 08/10/18 13:59, Mark Syms wrote:
> > > > > >>> That sounds entirely reasonable so long as you are absolutely
> > > > > >>> sure
> > > > > >>> that
> > > > > >>> nothing is ever going to mess with that glock, we erred on the
> > > > > >>> side
> > > > > >>> of
> > > > > >>> more caution not knowing whether it would be guaranteed safe or
> > > > > >>> not.
> > > > > >>> 
> > > > > >>> Thanks,
> > > > > >>> 
> > > > > >>>   Mark
> > > > > >> 
> > > > > >> We should have a look at the history to see how that wait got
> > > > > >> added.
> > > > > >> However the "dead" flag here means "don't touch this glock" and
> > > > > >> is
> > > > > >> there
> > > > > >> so that we can separate the marking dead from the actual removal
> > > > > >> from
> > > > > >> the list (which simplifies the locking during the scanning
> > > > > >> procedures)
> > > > > > 
> > > > > > You beat me to it :-)
> > > > > > 
> > > > > > I think there might be a bit of a problem inserting a new entry
> > > > > > with
> > > > > > the
> > > > > > same name before the old entry has been fully destroyed (or at
> > > > > > least
> > > > > > removed), which would be why the schedule() is there.
> > > > > 
> > > > > If the old entry is marked dead, all future lookups should ignore
> > > > > it. We
> > > > > should only have a single non-dead entry at a time, but that doesn't
> > > > > seem like it should need us to wait for it.
> > > > 
> > > > On the second call we do have the new glock to insert as arg2, so we
> > > > could
> > > > try to swap them cleanly, yeah.
> > > > 
> > > > > If we do discover that the wait is really required, then it sounds
> > > > > like
> > > > > as you mentioned above there is a lost wakeup, and that must
> > > > > presumably
> > > > > be on a code path that sets the dead flag and then fails to send a
> > > > > wake
> > > > > up later on. If we can drop the wait in the first place, that seems
> > > > > like
> > > > > a better plan,
> > > > 
> > > > Ooooh, I wonder if these two lines:
> > > > wake_up_glock(gl);
> > > > call_rcu(&gl->gl_rcu, gfs2_glock_dealloc);
> > > > 
> > > > in gfs2_glock_free() are the wrong way round?
> > > 
> > > No, what's important here is that the wake-up happens after the glock
> > > being freed has been removed from the rhashtable, and that's the case
> > > here. wake_up_glock actually accesses the glock, which may no longer
> > > be around after the call_rcu, so swapping the two lines would
> > > introduce a use-after-free bug.
> > 
> > Yes, realised that :-/
> > 
> > > There must be another reason for the missed wake-up. I'll have to
> > > study the code some more.
> > 
> > I think it needs to go into gfs2_glock_dealloc(), in such a way as to
> > avoid
> > that problem. Currently working out a patch to do that.
> 
> That doesn't sound right: find_insert_glock is waiting for the glock
> to be removed from the rhashtable. In gfs2_glock_free, we remove the
> glock from the rhashtable and then we do the wake-up. Delaying the
> wakeup further isn't going to help (but it might hide the problem).

The only way we can get the problem we're seeing is if we get an effective 
order of

T0: wake_up_glock()
T1: prepare_to_wait()
T1: schedule()

so clearly there's a way for that to happen. Any other order and either 
schedule() doesn't sleep or it gets woken.

The only way I can see at the moment to ensure that wake_up_glock() *cannot* 
get called until after prepare_to_wait() is to delay it until the read_side 
critical sections are done, and the first place that's got that property is 
the start of gfs2_glock_dealloc(), unless we want to add synchronise_rcu() to 
gfs2_glock_free() and I'm guessing there's a reason it's using call_rcu() 
instead.

I'll keep thinking about it.

> Instrumenting glock_wake_function might help though.
> 
> Thanks,
> Andreas


-- 
Tim Smith 




Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock

2018-10-09 Thread Andreas Gruenbacher
Mark and Tim,

does the following patch fix the problem, perhaps?

Thanks,
Andreas

---
 fs/gfs2/glock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 4614ee25f621..71e7c380d4c4 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -121,7 +121,7 @@ static void wake_up_glock(struct gfs2_glock *gl)
wait_queue_head_t *wq = glock_waitqueue(&gl->gl_name);
 
if (waitqueue_active(wq))
-   __wake_up(wq, TASK_NORMAL, 1, &gl->gl_name);
+   __wake_up(wq, TASK_NORMAL, 0, &gl->gl_name);
 }
 
 static void gfs2_glock_dealloc(struct rcu_head *rcu)
-- 
2.17.1



Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock

2018-10-09 Thread Tim Smith
On Tuesday, 9 October 2018 13:34:47 BST you wrote:
> On Mon, 8 Oct 2018 at 15:27, Tim Smith  wrote:
> > On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote:
> > > Hi,
> > > 
> > > On 08/10/18 14:10, Tim Smith wrote:
> > > > On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote:
> > > >> On 08/10/18 13:59, Mark Syms wrote:
> > > >>> That sounds entirely reasonable so long as you are absolutely sure
> > > >>> that
> > > >>> nothing is ever going to mess with that glock, we erred on the side
> > > >>> of
> > > >>> more caution not knowing whether it would be guaranteed safe or not.
> > > >>> 
> > > >>> Thanks,
> > > >>> 
> > > >>>   Mark
> > > >> 
> > > >> We should have a look at the history to see how that wait got added.
> > > >> However the "dead" flag here means "don't touch this glock" and is
> > > >> there
> > > >> so that we can separate the marking dead from the actual removal from
> > > >> the list (which simplifies the locking during the scanning
> > > >> procedures)
> > > > 
> > > > You beat me to it :-)
> > > > 
> > > > I think there might be a bit of a problem inserting a new entry with
> > > > the
> > > > same name before the old entry has been fully destroyed (or at least
> > > > removed), which would be why the schedule() is there.
> > > 
> > > If the old entry is marked dead, all future lookups should ignore it. We
> > > should only have a single non-dead entry at a time, but that doesn't
> > > seem like it should need us to wait for it.
> > 
> > On the second call we do have the new glock to insert as arg2, so we could
> > try to swap them cleanly, yeah.
> > 
> > > If we do discover that the wait is really required, then it sounds like
> > > as you mentioned above there is a lost wakeup, and that must presumably
> > > be on a code path that sets the dead flag and then fails to send a wake
> > > up later on. If we can drop the wait in the first place, that seems like
> > > a better plan,
> > 
> > Ooooh, I wonder if these two lines:
> > wake_up_glock(gl);
> > call_rcu(&gl->gl_rcu, gfs2_glock_dealloc);
> > 
> > in gfs2_glock_free() are the wrong way round?
> 
> No, what's important here is that the wake-up happens after the glock
> being freed has been removed from the rhashtable, and that's the case
> here. wake_up_glock actually accesses the glock, which may no longer
> be around after the call_rcu, so swapping the two lines would
> introduce a use-after-free bug.

Yes, realised that :-/

> There must be another reason for the missed wake-up. I'll have to
> study the code some more.

I think it needs to go into gfs2_glock_dealloc(), in such a way as to avoid 
that problem. Currently working out a patch to do that.

> Thanks,
> Andreas

-- 
Tim Smith 




Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock

2018-10-09 Thread Andreas Gruenbacher
On Mon, 8 Oct 2018 at 15:27, Tim Smith  wrote:
>
> On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote:
> > Hi,
> >
> > On 08/10/18 14:10, Tim Smith wrote:
> > > On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote:
> > >> On 08/10/18 13:59, Mark Syms wrote:
> > >>> That sounds entirely reasonable so long as you are absolutely sure that
> > >>> nothing is ever going to mess with that glock, we erred on the side of
> > >>> more caution not knowing whether it would be guaranteed safe or not.
> > >>>
> > >>> Thanks,
> > >>>
> > >>>   Mark
> > >>
> > >> We should have a look at the history to see how that wait got added.
> > >> However the "dead" flag here means "don't touch this glock" and is there
> > >> so that we can separate the marking dead from the actual removal from
> > >> the list (which simplifies the locking during the scanning procedures)
> > >
> > > You beat me to it :-)
> > >
> > > I think there might be a bit of a problem inserting a new entry with the
> > > same name before the old entry has been fully destroyed (or at least
> > > removed), which would be why the schedule() is there.
> >
> > If the old entry is marked dead, all future lookups should ignore it. We
> > should only have a single non-dead entry at a time, but that doesn't
> > seem like it should need us to wait for it.
>
> On the second call we do have the new glock to insert as arg2, so we could try
> to swap them cleanly, yeah.
>
> > If we do discover that the wait is really required, then it sounds like
> > as you mentioned above there is a lost wakeup, and that must presumably
> > be on a code path that sets the dead flag and then fails to send a wake
> > up later on. If we can drop the wait in the first place, that seems like
> > a better plan,
>
> Ooooh, I wonder if these two lines:
>
> wake_up_glock(gl);
> call_rcu(&gl->gl_rcu, gfs2_glock_dealloc);
>
> in gfs2_glock_free() are the wrong way round?

No, what's important here is that the wake-up happens after the glock
being freed has been removed from the rhashtable, and that's the case
here. wake_up_glock actually accesses the glock, which may no longer
be around after the call_rcu, so swapping the two lines would
introduce a use-after-free bug.

There must be another reason for the missed wake-up. I'll have to
study the code some more.

Thanks,
Andreas



Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock

2018-10-09 Thread Mark Syms
We think we have, just making a build to test.

Will follow up later.

Mark.

-Original Message-
From: Steven Whitehouse  
Sent: 09 October 2018 09:41
To: Mark Syms ; Tim Smith 
Cc: cluster-devel@redhat.com; Ross Lagerwall 
Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find 
insert glock



On 09/10/18 09:13, Mark Syms wrote:
> Having swapped the line below around we still see the timeout on 
> schedule fire, but only once in a fairly mega stress test. This is why 
> we weren't worried about the timeout being HZ, the situation is hardly 
> ever hit as having to wait is rare and normally we are woken from 
> schedule and without a timeout on schedule we never wake up so a rare 
> occurrence of waiting a second really doesn't seem too bad.
>
> Mark.
We should still get to the bottom of why the wake up is missing though, since 
without that fix we won't know if there is something else wrong somewhere,

Steve.

>
> -Original Message-
> From: Tim Smith 
> Sent: 08 October 2018 14:27
> To: Steven Whitehouse 
> Cc: Mark Syms ; cluster-devel@redhat.com; Ross 
> Lagerwall 
> Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in 
> find insert glock
>
> On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote:
>> Hi,
>>
>> On 08/10/18 14:10, Tim Smith wrote:
>>> On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote:
 On 08/10/18 13:59, Mark Syms wrote:
> That sounds entirely reasonable so long as you are absolutely sure 
> that nothing is ever going to mess with that glock, we erred on 
> the side of more caution not knowing whether it would be guaranteed safe 
> or not.
>
> Thanks,
>
>   Mark
 We should have a look at the history to see how that wait got added.
 However the "dead" flag here means "don't touch this glock" and is 
 there so that we can separate the marking dead from the actual 
 removal from the list (which simplifies the locking during the 
 scanning procedures)
>>> You beat me to it :-)
>>>
>>> I think there might be a bit of a problem inserting a new entry with 
>>> the same name before the old entry has been fully destroyed (or at 
>>> least removed), which would be why the schedule() is there.
>> If the old entry is marked dead, all future lookups should ignore it.
>> We should only have a single non-dead entry at a time, but that 
>> doesn't seem like it should need us to wait for it.
> On the second call we do have the new glock to insert as arg2, so we could 
> try to swap them cleanly, yeah.
>
>> If we do discover that the wait is really required, then it sounds 
>> like as you mentioned above there is a lost wakeup, and that must 
>> presumably be on a code path that sets the dead flag and then fails 
>> to send a wake up later on. If we can drop the wait in the first 
>> place, that seems like a better plan,
> Ooooh, I wonder if these two lines:
>
>   wake_up_glock(gl);
>   call_rcu(&gl->gl_rcu, gfs2_glock_dealloc);
>
> in gfs2_glock_free() are the wrong way round?
>
> --
> Tim Smith 
>
>




Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock

2018-10-09 Thread Steven Whitehouse




On 09/10/18 09:13, Mark Syms wrote:

Having swapped the line below around we still see the timeout on schedule fire, 
but only once in
a fairly mega stress test. This is why we weren't worried about the timeout 
being HZ, the situation
is hardly ever hit as having to wait is rare and normally we are woken from 
schedule and without
a timeout on schedule we never wake up so a rare occurrence of waiting a second 
really doesn't
seem too bad.

Mark.
We should still get to the bottom of why the wake up is missing though, 
since without that fix we won't know if there is something else wrong 
somewhere,


Steve.



-Original Message-
From: Tim Smith 
Sent: 08 October 2018 14:27
To: Steven Whitehouse 
Cc: Mark Syms ; cluster-devel@redhat.com; Ross Lagerwall 

Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find 
insert glock

On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote:

Hi,

On 08/10/18 14:10, Tim Smith wrote:

On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote:

On 08/10/18 13:59, Mark Syms wrote:

That sounds entirely reasonable so long as you are absolutely sure
that nothing is ever going to mess with that glock, we erred on
the side of more caution not knowing whether it would be guaranteed safe or not.

Thanks,

Mark

We should have a look at the history to see how that wait got added.
However the "dead" flag here means "don't touch this glock" and is
there so that we can separate the marking dead from the actual
removal from the list (which simplifies the locking during the
scanning procedures)

You beat me to it :-)

I think there might be a bit of a problem inserting a new entry with
the same name before the old entry has been fully destroyed (or at
least removed), which would be why the schedule() is there.

If the old entry is marked dead, all future lookups should ignore it.
We should only have a single non-dead entry at a time, but that
doesn't seem like it should need us to wait for it.

On the second call we do have the new glock to insert as arg2, so we could try 
to swap them cleanly, yeah.


If we do discover that the wait is really required, then it sounds
like as you mentioned above there is a lost wakeup, and that must
presumably be on a code path that sets the dead flag and then fails to
send a wake up later on. If we can drop the wait in the first place,
that seems like a better plan,

Ooooh, I wonder if these two lines:

wake_up_glock(gl);
call_rcu(&gl->gl_rcu, gfs2_glock_dealloc);

in gfs2_glock_free() are the wrong way round?

--
Tim Smith 






Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find insert glock

2018-10-09 Thread Mark Syms
Having swapped the line below around we still see the timeout on schedule fire, 
but only once in
a fairly mega stress test. This is why we weren't worried about the timeout 
being HZ, the situation
is hardly ever hit as having to wait is rare and normally we are woken from 
schedule and without
a timeout on schedule we never wake up so a rare occurrence of waiting a second 
really doesn't
seem too bad.

Mark.

-Original Message-
From: Tim Smith  
Sent: 08 October 2018 14:27
To: Steven Whitehouse 
Cc: Mark Syms ; cluster-devel@redhat.com; Ross Lagerwall 

Subject: Re: [Cluster-devel] [PATCH 1/2] GFS2: use schedule timeout in find 
insert glock

On Monday, 8 October 2018 14:13:10 BST Steven Whitehouse wrote:
> Hi,
> 
> On 08/10/18 14:10, Tim Smith wrote:
> > On Monday, 8 October 2018 14:03:24 BST Steven Whitehouse wrote:
> >> On 08/10/18 13:59, Mark Syms wrote:
> >>> That sounds entirely reasonable so long as you are absolutely sure 
> >>> that nothing is ever going to mess with that glock, we erred on 
> >>> the side of more caution not knowing whether it would be guaranteed safe 
> >>> or not.
> >>> 
> >>> Thanks,
> >>> 
> >>>   Mark
> >> 
> >> We should have a look at the history to see how that wait got added.
> >> However the "dead" flag here means "don't touch this glock" and is 
> >> there so that we can separate the marking dead from the actual 
> >> removal from the list (which simplifies the locking during the 
> >> scanning procedures)
> > 
> > You beat me to it :-)
> > 
> > I think there might be a bit of a problem inserting a new entry with 
> > the same name before the old entry has been fully destroyed (or at 
> > least removed), which would be why the schedule() is there.
> 
> If the old entry is marked dead, all future lookups should ignore it. 
> We should only have a single non-dead entry at a time, but that 
> doesn't seem like it should need us to wait for it.

On the second call we do have the new glock to insert as arg2, so we could try 
to swap them cleanly, yeah.

> If we do discover that the wait is really required, then it sounds 
> like as you mentioned above there is a lost wakeup, and that must 
> presumably be on a code path that sets the dead flag and then fails to 
> send a wake up later on. If we can drop the wait in the first place, 
> that seems like a better plan,

Ooooh, I wonder if these two lines:

wake_up_glock(gl);
call_rcu(&gl->gl_rcu, gfs2_glock_dealloc);

in gfs2_glock_free() are the wrong way round?

--
Tim Smith