Re: [Cluster-devel] [PATCH 6/8] gfs2: instrumentation wrt log_flush stuck

2020-06-05 Thread Bob Peterson
Hi Andreas,

- Original Message -
(snip)
> > @@ -970,7 +969,16 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct
> > gfs2_glock *gl, u32 flags)
> >
> > if (!(flags & GFS2_LOG_HEAD_FLUSH_NORMAL)) {
> > if (!sdp->sd_log_idle) {
> > +   unsigned long start = jiffies;
> > +
> > for (;;) {
> > +   if (time_after(jiffies, start + (HZ *
> > 600))) {
> 
> This should probably have some rate limiting as well, for example:

Seems unnecessary. If the log flush gets stuck, the message will be printed
once, and at most every 10 minutes.
 
> start = jiffies;
> 
> I'm not sure what provides similar rate limiting in gfs2_ail1_flush.
> 
> > +   fs_err(sdp, "Error: In
> > gfs2_log_flush "
> > +  "for 10 minutes! t=%d\n",
> > +  current->journal_info ? 1 :
> > 0);
> 
> Please don't break the format string up like that; this makes grepping
> harder.

How do you propose I break present it without going over 80 character?
I could #define it as a constant, or put it into a separate function
that has less indentation, I suppose.
 
Bob



Re: [Cluster-devel] [PATCH 6/8] gfs2: instrumentation wrt log_flush stuck

2020-06-05 Thread Andreas Gruenbacher
On Fri, Jun 5, 2020 at 4:49 PM Bob Peterson  wrote:
> Hi Andreas,
>
> - Original Message -
> (snip)
> > > @@ -970,7 +969,16 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct
> > > gfs2_glock *gl, u32 flags)
> > >
> > > if (!(flags & GFS2_LOG_HEAD_FLUSH_NORMAL)) {
> > > if (!sdp->sd_log_idle) {
> > > +   unsigned long start = jiffies;
> > > +
> > > for (;;) {
> > > +   if (time_after(jiffies, start + (HZ *
> > > 600))) {
> >
> > This should probably have some rate limiting as well, for example:
>
> Seems unnecessary. If the log flush gets stuck, the message will be printed
> once, and at most every 10 minutes.

No, after ten minutes, the message will actually be printed for each
iteration of the loop. That's exactly why I was suggesting the rate
limiting.

> > start = jiffies;
> >
> > I'm not sure what provides similar rate limiting in gfs2_ail1_flush.
> >
> > > +   fs_err(sdp, "Error: In
> > > gfs2_log_flush "
> > > +  "for 10 minutes! t=%d\n",
> > > +  current->journal_info ? 1 :
> > > 0);
> >
> > Please don't break the format string up like that; this makes grepping
> > harder.
>
> How do you propose I break present it without going over 80 character?
> I could #define it as a constant, or put it into a separate function
> that has less indentation, I suppose.

There *is* no hard 80 character limit. The checkpatch warnings
shouldn't push us into making our code worse. Also note that since
commit bdc48fa11e, checkpatch will only warn about lines longer than
100 characters.

Thanks,
Andreas



Re: [Cluster-devel] [PATCH 6/8] gfs2: instrumentation wrt log_flush stuck

2020-06-05 Thread Bob Peterson
- Original Message -
> On Fri, Jun 5, 2020 at 4:49 PM Bob Peterson  wrote:
> > Hi Andreas,
> >
> > - Original Message -
> > (snip)
> > > > @@ -970,7 +969,16 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct
> > > > gfs2_glock *gl, u32 flags)
> > > >
> > > > if (!(flags & GFS2_LOG_HEAD_FLUSH_NORMAL)) {
> > > > if (!sdp->sd_log_idle) {
> > > > +   unsigned long start = jiffies;
> > > > +
> > > > for (;;) {
> > > > +   if (time_after(jiffies, start + (HZ *
> > > > 600))) {
> > >
> > > This should probably have some rate limiting as well, for example:
> >
> > Seems unnecessary. If the log flush gets stuck, the message will be printed
> > once, and at most every 10 minutes.
> 
> No, after ten minutes, the message will actually be printed for each
> iteration of the loop. That's exactly why I was suggesting the rate
> limiting.

No, after ten minutes it dumps the ail list so you can see the problem
and exits the loop with "break;".

The next time it enters the loop, it starts with a new value of start
which doesn't expire for another ten minutes.

Bob



Re: [Cluster-devel] [PATCH 6/8] gfs2: instrumentation wrt log_flush stuck

2020-06-05 Thread Andreas Gruenbacher
On Fri, Jun 5, 2020 at 6:15 PM Bob Peterson  wrote:
> - Original Message -
> > On Fri, Jun 5, 2020 at 4:49 PM Bob Peterson  wrote:
> > > Hi Andreas,
> > >
> > > - Original Message -
> > > (snip)
> > > > > @@ -970,7 +969,16 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct
> > > > > gfs2_glock *gl, u32 flags)
> > > > >
> > > > > if (!(flags & GFS2_LOG_HEAD_FLUSH_NORMAL)) {
> > > > > if (!sdp->sd_log_idle) {
> > > > > +   unsigned long start = jiffies;
> > > > > +
> > > > > for (;;) {
> > > > > +   if (time_after(jiffies, start + (HZ *
> > > > > 600))) {
> > > >
> > > > This should probably have some rate limiting as well, for example:
> > >
> > > Seems unnecessary. If the log flush gets stuck, the message will be 
> > > printed
> > > once, and at most every 10 minutes.
> >
> > No, after ten minutes, the message will actually be printed for each
> > iteration of the loop. That's exactly why I was suggesting the rate
> > limiting.
>
> No, after ten minutes it dumps the ail list so you can see the problem
> and exits the loop with "break;".
>
> The next time it enters the loop, it starts with a new value of start
> which doesn't expire for another ten minutes.

Ok, I misread.

Thanks,
Andreas



Re: [Cluster-devel] [PATCH 6/8] gfs2: instrumentation wrt log_flush stuck

2020-06-05 Thread Andreas Gruenbacher
Hi Bob,

On Tue, May 26, 2020 at 3:05 PM Bob Peterson  wrote:
> This adds checks for gfs2_log_flush being stuck, similarly to the check
> in gfs2_ail1_flush.
>
> Signed-off-by: Bob Peterson 
> ---
>  fs/gfs2/log.c | 14 +++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
> index 1d51b4781bdd..636c82dda68b 100644
> --- a/fs/gfs2/log.c
> +++ b/fs/gfs2/log.c
> @@ -145,9 +145,6 @@ static void dump_ail_list(struct gfs2_sbd *sdp)
> struct gfs2_bufdata *bd;
> struct buffer_head *bh;
>
> -   fs_err(sdp, "Error: In gfs2_ail1_flush for ten minutes! t=%d\n",
> -  current->journal_info ? 1 : 0);
> -
> list_for_each_entry_reverse(tr, >sd_ail1_list, tr_list) {
> list_for_each_entry_reverse(bd, >tr_ail1_list,
> bd_ail_st_list) {
> @@ -197,6 +194,8 @@ void gfs2_ail1_flush(struct gfs2_sbd *sdp, struct 
> writeback_control *wbc)
>  restart:
> ret = 0;
> if (time_after(jiffies, flush_start + (HZ * 600))) {
> +   fs_err(sdp, "Error: In gfs2_ail1_flush for ten minutes! "
> +  "t=%d\n", current->journal_info ? 1 : 0);
> dump_ail_list(sdp);
> goto out;
> }
> @@ -970,7 +969,16 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct 
> gfs2_glock *gl, u32 flags)
>
> if (!(flags & GFS2_LOG_HEAD_FLUSH_NORMAL)) {
> if (!sdp->sd_log_idle) {
> +   unsigned long start = jiffies;
> +
> for (;;) {
> +   if (time_after(jiffies, start + (HZ * 600))) {

This should probably have some rate limiting as well, for example:

start = jiffies;

I'm not sure what provides similar rate limiting in gfs2_ail1_flush.

> +   fs_err(sdp, "Error: In gfs2_log_flush 
> "
> +  "for 10 minutes! t=%d\n",
> +  current->journal_info ? 1 : 0);

Please don't break the format string up like that; this makes grepping harder.

> +   dump_ail_list(sdp);
> +   break;
> +   }
> gfs2_ail1_start(sdp);
> gfs2_ail1_wait(sdp);
> if (gfs2_ail1_empty(sdp, 0))
> --
> 2.26.2
>