Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64

2019-01-09 Thread Andreas Gruenbacher
On Wed, 9 Jan 2019 at 19:44, Mark Syms  wrote:
>
> So, actually the original comparison in the assert (dodgy scaling factor not 
> withstanding) was probably correct in that we don't ever want to remove all 
> blocks from the inode as one of them is used for the inode itself? Or do we 
> still think it should allow for change to be the negative of current blocks?

It doesn't make a difference; all we care about is that we don't go
negative. I think the updated comparison better reflects that goal.

Andreas



Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64

2019-01-09 Thread Mark Syms
So, actually the original comparison in the assert (dodgy scaling factor not 
withstanding) was probably correct in that we don't ever want to remove all 
blocks from the inode as one of them is used for the inode itself? Or do we 
still think it should allow for change to be the negative of current blocks?

Mark

From: Andreas Gruenbacher 
Sent: Wednesday, 9 January 2019 17:24
To: Tim Smith 
CC: Mark Syms ,cluster-devel 
,Igor Druzhinin 
Subject: Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64


On Wed, 9 Jan 2019 at 18:14, Tim Smith  wrote:
>
> On Wednesday, 9 January 2019 15:35:05 GMT Andreas Gruenbacher wrote:
> > On Wed, 9 Jan 2019 at 14:43, Mark Syms  wrote:
> > > We don't yet know how the assert got triggered as we've only seen it once
> > > and in the original form it looks like it would be very hard to trigger in
> > > any normal case (given that in default usage i_blocks should be at least 8
> > > times what any putative value for change could be). So, for the assert to
> > > have triggered we've been asked to remove at least 8 times the number of
> > > blocks currently allocated to the inode. Possible causes could be a double
> > > release or some other higher level bug that will require further
> > > investigation to uncover.
> >
> > The following change has at least survived xfstests:
> >
> > --- a/fs/gfs2/inode.h
> > +++ b/fs/gfs2/inode.h
> > @@ -61,8 +61,8 @@ static inline u64 gfs2_get_inode_blocks(const struct
> > inode *inode)
> >
> >  static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change)
> >  {
> > -gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks >
> > -change));
> > -change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
> > +change <<= inode->i_blkbits - 9;
> > +gfs2_assert(GFS2_SB(inode), change >= 0 || inode->i_blocks >= -change);
> > inode->i_blocks += change;
> >  }
> >
> > Andreas
>
> I'll use
>
> change <<= (GFS2_SB(inode)->sd_sb.sb_bsize_shift - GFS2_BASIC_BLOCK_SHIFT);
>
> for consistency with the gfs2_get/set_inode_blocks(). I'll send the patch in a
> bit.
>
> Given what it was like before, either i_blocks was already 0 or -change
> somehow became stupidly huge. Anything else looks like it would be hard to
> reproduce without mkfs.gfs2 -b 512 (so that GFS2_SB(inode)->sd_sb.sb_bsize ==
> GFS2_BASIC_BLOCK) which we don't do.
>
> I'll try to work out what could have caused it and see if we can provoke it
> again.
>
> Out of curiosity I did a few tests where I created a file on GFS2, copied /
> dev/null on top of it, and then ran stat on the file. It seems like GFS2 never
> frees the last allocation on truncate; stat always reports 1, 2, 4 or 8 blocks
> in use for a zero-length file depending on the underlying filesystem block
> size, unlike (say) ext3/4 where it reports 0. I presume this is intentional so
> maybe some corner case where it *is* trying to do that is the root of the
> problem.

Yes, that's intentional. An empty file without extended attributes
consumes one block. Extended attributes consume at least one
additional block.

Andreas


Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64

2019-01-09 Thread Andreas Gruenbacher
On Wed, 9 Jan 2019 at 18:14, Tim Smith  wrote:
>
> On Wednesday, 9 January 2019 15:35:05 GMT Andreas Gruenbacher wrote:
> > On Wed, 9 Jan 2019 at 14:43, Mark Syms  wrote:
> > > We don't yet know how the assert got triggered as we've only seen it once
> > > and in the original form it looks like it would be very hard to trigger in
> > > any normal case (given that in default usage i_blocks should be at least 8
> > > times what any putative value for change could be). So, for the assert to
> > > have triggered we've been asked to remove at least 8 times the number of
> > > blocks currently allocated to the inode. Possible causes could be a double
> > > release or some other higher level bug that will require further
> > > investigation to uncover.
> >
> > The following change has at least survived xfstests:
> >
> > --- a/fs/gfs2/inode.h
> > +++ b/fs/gfs2/inode.h
> > @@ -61,8 +61,8 @@ static inline u64 gfs2_get_inode_blocks(const struct
> > inode *inode)
> >
> >  static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change)
> >  {
> > -gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks >
> > -change));
> > -change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
> > +change <<= inode->i_blkbits - 9;
> > +gfs2_assert(GFS2_SB(inode), change >= 0 || inode->i_blocks >= -change);
> > inode->i_blocks += change;
> >  }
> >
> > Andreas
>
> I'll use
>
> change <<= (GFS2_SB(inode)->sd_sb.sb_bsize_shift - GFS2_BASIC_BLOCK_SHIFT);
>
> for consistency with the gfs2_get/set_inode_blocks(). I'll send the patch in a
> bit.
>
> Given what it was like before, either i_blocks was already 0 or -change
> somehow became stupidly huge. Anything else looks like it would be hard to
> reproduce without mkfs.gfs2 -b 512 (so that GFS2_SB(inode)->sd_sb.sb_bsize ==
> GFS2_BASIC_BLOCK) which we don't do.
>
> I'll try to work out what could have caused it and see if we can provoke it
> again.
>
> Out of curiosity I did a few tests where I created a file on GFS2, copied /
> dev/null on top of it, and then ran stat on the file. It seems like GFS2 never
> frees the last allocation on truncate; stat always reports 1, 2, 4 or 8 blocks
> in use for a zero-length file depending on the underlying filesystem block
> size, unlike (say) ext3/4 where it reports 0. I presume this is intentional so
> maybe some corner case where it *is* trying to do that is the root of the
> problem.

Yes, that's intentional. An empty file without extended attributes
consumes one block. Extended attributes consume at least one
additional block.

Andreas



Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64

2019-01-09 Thread Steven Whitehouse

Hi,

On 09/01/2019 17:14, Tim Smith wrote:

On Wednesday, 9 January 2019 15:35:05 GMT Andreas Gruenbacher wrote:

On Wed, 9 Jan 2019 at 14:43, Mark Syms  wrote:

We don't yet know how the assert got triggered as we've only seen it once
and in the original form it looks like it would be very hard to trigger in
any normal case (given that in default usage i_blocks should be at least 8
times what any putative value for change could be). So, for the assert to
have triggered we've been asked to remove at least 8 times the number of
blocks currently allocated to the inode. Possible causes could be a double
release or some other higher level bug that will require further
investigation to uncover.

The following change has at least survived xfstests:

--- a/fs/gfs2/inode.h
+++ b/fs/gfs2/inode.h
@@ -61,8 +61,8 @@ static inline u64 gfs2_get_inode_blocks(const struct
inode *inode)

  static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change)
  {
-gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks >
-change));
-change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
+change <<= inode->i_blkbits - 9;
+gfs2_assert(GFS2_SB(inode), change >= 0 || inode->i_blocks >= -change);
inode->i_blocks += change;
  }

Andreas

I'll use

change <<= (GFS2_SB(inode)->sd_sb.sb_bsize_shift - GFS2_BASIC_BLOCK_SHIFT);

for consistency with the gfs2_get/set_inode_blocks(). I'll send the patch in a
bit.

Given what it was like before, either i_blocks was already 0 or -change
somehow became stupidly huge. Anything else looks like it would be hard to
reproduce without mkfs.gfs2 -b 512 (so that GFS2_SB(inode)->sd_sb.sb_bsize ==
GFS2_BASIC_BLOCK) which we don't do.

I'll try to work out what could have caused it and see if we can provoke it
again.

Out of curiosity I did a few tests where I created a file on GFS2, copied /
dev/null on top of it, and then ran stat on the file. It seems like GFS2 never
frees the last allocation on truncate; stat always reports 1, 2, 4 or 8 blocks
in use for a zero-length file depending on the underlying filesystem block
size, unlike (say) ext3/4 where it reports 0. I presume this is intentional so
maybe some corner case where it *is* trying to do that is the root of the
problem.

That is because gfs2 uses a block for each inode, so it makes sense to 
account for it in that way. For ext* the inodes are in separate areas of 
the disk, and they only use up a partial block, and they are also 
counted separately too. So it is a historical design difference I think,


Steve.



Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64

2019-01-09 Thread Tim Smith
On Wednesday, 9 January 2019 15:35:05 GMT Andreas Gruenbacher wrote:
> On Wed, 9 Jan 2019 at 14:43, Mark Syms  wrote:
> > We don't yet know how the assert got triggered as we've only seen it once
> > and in the original form it looks like it would be very hard to trigger in
> > any normal case (given that in default usage i_blocks should be at least 8
> > times what any putative value for change could be). So, for the assert to
> > have triggered we've been asked to remove at least 8 times the number of
> > blocks currently allocated to the inode. Possible causes could be a double
> > release or some other higher level bug that will require further
> > investigation to uncover.
> 
> The following change has at least survived xfstests:
> 
> --- a/fs/gfs2/inode.h
> +++ b/fs/gfs2/inode.h
> @@ -61,8 +61,8 @@ static inline u64 gfs2_get_inode_blocks(const struct
> inode *inode)
> 
>  static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change)
>  {
> -gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks >
> -change));
> -change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
> +change <<= inode->i_blkbits - 9;
> +gfs2_assert(GFS2_SB(inode), change >= 0 || inode->i_blocks >= -change);
> inode->i_blocks += change;
>  }
> 
> Andreas

I'll use 

change <<= (GFS2_SB(inode)->sd_sb.sb_bsize_shift - GFS2_BASIC_BLOCK_SHIFT);

for consistency with the gfs2_get/set_inode_blocks(). I'll send the patch in a 
bit.

Given what it was like before, either i_blocks was already 0 or -change 
somehow became stupidly huge. Anything else looks like it would be hard to 
reproduce without mkfs.gfs2 -b 512 (so that GFS2_SB(inode)->sd_sb.sb_bsize == 
GFS2_BASIC_BLOCK) which we don't do.

I'll try to work out what could have caused it and see if we can provoke it 
again.

Out of curiosity I did a few tests where I created a file on GFS2, copied /
dev/null on top of it, and then ran stat on the file. It seems like GFS2 never 
frees the last allocation on truncate; stat always reports 1, 2, 4 or 8 blocks 
in use for a zero-length file depending on the underlying filesystem block 
size, unlike (say) ext3/4 where it reports 0. I presume this is intentional so 
maybe some corner case where it *is* trying to do that is the root of the 
problem.

-- 
Tim Smith 




Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64

2019-01-09 Thread Andreas Gruenbacher
On Wed, 9 Jan 2019 at 14:43, Mark Syms  wrote:

> We don't yet know how the assert got triggered as we've only seen it once
> and in the original form it looks like it would be very hard to trigger in
> any normal case (given that in default usage i_blocks should be at least 8
> times what any putative value for change could be). So, for the assert to
> have triggered we've been asked to remove at least 8 times the number of
> blocks currently allocated to the inode. Possible causes could be a double
> release or some other higher level bug that will require further
> investigation to uncover.
>

The following change has at least survived xfstests:

--- a/fs/gfs2/inode.h
+++ b/fs/gfs2/inode.h
@@ -61,8 +61,8 @@ static inline u64 gfs2_get_inode_blocks(const struct
inode *inode)

 static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change)
 {
-gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks >
-change));
-change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
+change <<= inode->i_blkbits - 9;
+gfs2_assert(GFS2_SB(inode), change >= 0 || inode->i_blocks >= -change);
 inode->i_blocks += change;
 }

Andreas


Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64

2019-01-09 Thread Mark Syms
We don't yet know how the assert got triggered as we've only seen it once and 
in the original form it looks like it would be very hard to trigger in any 
normal case (given that in default usage i_blocks should be at least 8 times 
what any putative value for change could be). So, for the assert to have 
triggered we've been asked to remove at least 8 times the number of blocks 
currently allocated to the inode. Possible causes could be a double release or 
some other higher level bug that will require further investigation to uncover.

Mark.

-Original Message-
From: Andreas Gruenbacher  
Sent: 09 January 2019 13:31
To: Tim Smith 
Cc: cluster-devel ; Igor Druzhinin 
; Mark Syms 
Subject: Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64

Mark and Tim,

On Wed, 9 Jan 2019 at 13:05, Tim Smith  wrote:
> On Tuesday, 8 January 2019 13:32:20 GMT Mark Syms wrote:
> > Hi,
> >
> > We've seen this in testing with 4.19.
> >
> > Full trace is at the bottom.
> >
> > Looking at the code though it looks like it will assert if the value 
> > of change is equal to the number of blocks currently allocated to the inode.
> > Is this expected or should the assert be using >= instead of > ?
>
> It's weirder. Looking at
>
> static inline void gfs2_add_inode_blocks(struct inode *inode, s64 
> change) {
> gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks > 
> -change));
> change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
> inode->i_blocks += change;
> }
>
> where the BUG is happening, "inode->i_blocks" is counting of 512b 
> blocks and "change" is in units of whatever-the-superblock-said which 
> will probably be counting 4096b blocks, so the comparison makes no sense.

indeed. I wonder why this hasn't been discovered long ago.

> It should probably read
>
> static inline void gfs2_add_inode_blocks(struct inode *inode, s64 
> change) {
> change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
> gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks >= 
> -change));
> inode->i_blocks += change;
> }
>
> so I'll make a patch for that unless someone wants to correct me.

You can just shift by (inode->i_blkbits - 9).

Can you still trigger the assert with this fix?

Thanks,
Andreas



Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64

2019-01-09 Thread Andreas Gruenbacher
Mark and Tim,

On Wed, 9 Jan 2019 at 13:05, Tim Smith  wrote:
> On Tuesday, 8 January 2019 13:32:20 GMT Mark Syms wrote:
> > Hi,
> >
> > We've seen this in testing with 4.19.
> >
> > Full trace is at the bottom.
> >
> > Looking at the code though it looks like it will assert if the value of
> > change is equal to the number of blocks currently allocated to the inode.
> > Is this expected or should the assert be using >= instead of > ?
>
> It's weirder. Looking at
>
> static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change)
> {
> gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks > 
> -change));
> change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
> inode->i_blocks += change;
> }
>
> where the BUG is happening, "inode->i_blocks" is counting of 512b blocks and
> "change" is in units of whatever-the-superblock-said which will probably be
> counting 4096b blocks, so the comparison makes no sense.

indeed. I wonder why this hasn't been discovered long ago.

> It should probably read
>
> static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change)
> {
> change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
> gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks >= 
> -change));
> inode->i_blocks += change;
> }
>
> so I'll make a patch for that unless someone wants to correct me.

You can just shift by (inode->i_blkbits - 9).

Can you still trigger the assert with this fix?

Thanks,
Andreas



Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64

2019-01-09 Thread Tim Smith
On Tuesday, 8 January 2019 13:32:20 GMT Mark Syms wrote:
> Hi,
> 
> We've seen this in testing with 4.19.
> 
> Full trace is at the bottom.
> 
> Looking at the code though it looks like it will assert if the value of
> change is equal to the number of blocks currently allocated to the inode.
> Is this expected or should the assert be using >= instead of > ?

It's weirder. Looking at

static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change)
{
gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks > -change));
change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
inode->i_blocks += change;
}

where the BUG is happening, "inode->i_blocks" is counting of 512b blocks and
"change" is in units of whatever-the-superblock-said which will probably be
counting 4096b blocks, so the comparison makes no sense.

It should probably read

static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change)
{
change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks >= 
-change));
inode->i_blocks += change;
}

so I'll make a patch for that unless someone wants to correct me.

As for why it's triggering, given the current scale difference I can only
assume it's an attempt to remove blocks from something which has none left.
I'd suspect a boundary case in sweep_bh_for_rgrps().

(It *is* possible currently to remove 1 4096b block from something which has
8 512b blocks so you can get to the state of "inode->i_blocks == 0)" It would
only blow up all the time if you did a "mkfs.gfs2 -b 512" which I don't
imagine anyone does that often...)

> Thanks,
> 
>   Mark
> 
> --
> 
> 2018-12-24 00:07:03 UTC] [ 3366.209273] kernel BUG at fs/gfs2/inode.h:64!

[snip]

> [2018-12-24 00:07:03 UTC] [ 3366.209802] Call Trace:
> [2018-12-24 00:07:03 UTC] [ 3366.209835]  punch_hole+0xf4f/0x10d0 [gfs2]
> [2018-12-24 00:07:03 UTC] [ 3366.209866]  ? __switch_to_asm+0x40/0x70
> [2018-12-24 00:07:03 UTC] [ 3366.209907]  ? __switch_to_asm+0x40/0x70
> [2018-12-24 00:07:03 UTC] [ 3366.209924]  ? __switch_to_asm+0x40/0x70
> [2018-12-24 00:07:03 UTC] [ 3366.209949]  ? punch_hole+0xb3e/0x10d0 [gfs2]
> [2018-12-24 00:07:03 UTC] [ 3366.209983]  gfs2_evict_inode+0x57b/0x680 [gfs2]
> [2018-12-24 00:07:03 UTC] [ 3366.210036]  ? 
> __inode_wait_for_writeback+0x75/0xe0
> [2018-12-24 00:07:03 UTC] [ 3366.210066]  ? gfs2_evict_inode+0x1c7/0x680 
> [gfs2]
> [2018-12-24 00:07:03 UTC] [ 3366.210090]  evict+0xc6/0x1a0
> [2018-12-24 00:07:03 UTC] [ 3366.210151]  delete_work_func+0x60/0x70 [gfs2]
> [2018-12-24 00:07:03 UTC] [ 3366.210177]  process_one_work+0x165/0x370
> [2018-12-24 00:07:03 UTC] [ 3366.210197]  worker_thread+0x49/0x3e0
> [2018-12-24 00:07:03 UTC] [ 3366.210216]  kthread+0xf8/0x130
> [2018-12-24 00:07:03 UTC] [ 3366.210235]  ? rescuer_thread+0x310/0x310
> [2018-12-24 00:07:03 UTC] [ 3366.210284]  ? kthread_bind+0x10/0x10
> [2018-12-24 00:07:04 UTC] [ 3366.210301]  ret_from_fork+0x35/0x40


-- 
Tim Smith