Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64
On Wed, 9 Jan 2019 at 19:44, Mark Syms wrote: > > So, actually the original comparison in the assert (dodgy scaling factor not > withstanding) was probably correct in that we don't ever want to remove all > blocks from the inode as one of them is used for the inode itself? Or do we > still think it should allow for change to be the negative of current blocks? It doesn't make a difference; all we care about is that we don't go negative. I think the updated comparison better reflects that goal. Andreas
Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64
So, actually the original comparison in the assert (dodgy scaling factor not withstanding) was probably correct in that we don't ever want to remove all blocks from the inode as one of them is used for the inode itself? Or do we still think it should allow for change to be the negative of current blocks? Mark From: Andreas Gruenbacher Sent: Wednesday, 9 January 2019 17:24 To: Tim Smith CC: Mark Syms ,cluster-devel ,Igor Druzhinin Subject: Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64 On Wed, 9 Jan 2019 at 18:14, Tim Smith wrote: > > On Wednesday, 9 January 2019 15:35:05 GMT Andreas Gruenbacher wrote: > > On Wed, 9 Jan 2019 at 14:43, Mark Syms wrote: > > > We don't yet know how the assert got triggered as we've only seen it once > > > and in the original form it looks like it would be very hard to trigger in > > > any normal case (given that in default usage i_blocks should be at least 8 > > > times what any putative value for change could be). So, for the assert to > > > have triggered we've been asked to remove at least 8 times the number of > > > blocks currently allocated to the inode. Possible causes could be a double > > > release or some other higher level bug that will require further > > > investigation to uncover. > > > > The following change has at least survived xfstests: > > > > --- a/fs/gfs2/inode.h > > +++ b/fs/gfs2/inode.h > > @@ -61,8 +61,8 @@ static inline u64 gfs2_get_inode_blocks(const struct > > inode *inode) > > > > static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change) > > { > > -gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks > > > -change)); > > -change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK); > > +change <<= inode->i_blkbits - 9; > > +gfs2_assert(GFS2_SB(inode), change >= 0 || inode->i_blocks >= -change); > > inode->i_blocks += change; > > } > > > > Andreas > > I'll use > > change <<= (GFS2_SB(inode)->sd_sb.sb_bsize_shift - GFS2_BASIC_BLOCK_SHIFT); > > for consistency with the gfs2_get/set_inode_blocks(). I'll send the patch in a > bit. > > Given what it was like before, either i_blocks was already 0 or -change > somehow became stupidly huge. Anything else looks like it would be hard to > reproduce without mkfs.gfs2 -b 512 (so that GFS2_SB(inode)->sd_sb.sb_bsize == > GFS2_BASIC_BLOCK) which we don't do. > > I'll try to work out what could have caused it and see if we can provoke it > again. > > Out of curiosity I did a few tests where I created a file on GFS2, copied / > dev/null on top of it, and then ran stat on the file. It seems like GFS2 never > frees the last allocation on truncate; stat always reports 1, 2, 4 or 8 blocks > in use for a zero-length file depending on the underlying filesystem block > size, unlike (say) ext3/4 where it reports 0. I presume this is intentional so > maybe some corner case where it *is* trying to do that is the root of the > problem. Yes, that's intentional. An empty file without extended attributes consumes one block. Extended attributes consume at least one additional block. Andreas
Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64
On Wed, 9 Jan 2019 at 18:14, Tim Smith wrote: > > On Wednesday, 9 January 2019 15:35:05 GMT Andreas Gruenbacher wrote: > > On Wed, 9 Jan 2019 at 14:43, Mark Syms wrote: > > > We don't yet know how the assert got triggered as we've only seen it once > > > and in the original form it looks like it would be very hard to trigger in > > > any normal case (given that in default usage i_blocks should be at least 8 > > > times what any putative value for change could be). So, for the assert to > > > have triggered we've been asked to remove at least 8 times the number of > > > blocks currently allocated to the inode. Possible causes could be a double > > > release or some other higher level bug that will require further > > > investigation to uncover. > > > > The following change has at least survived xfstests: > > > > --- a/fs/gfs2/inode.h > > +++ b/fs/gfs2/inode.h > > @@ -61,8 +61,8 @@ static inline u64 gfs2_get_inode_blocks(const struct > > inode *inode) > > > > static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change) > > { > > -gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks > > > -change)); > > -change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK); > > +change <<= inode->i_blkbits - 9; > > +gfs2_assert(GFS2_SB(inode), change >= 0 || inode->i_blocks >= -change); > > inode->i_blocks += change; > > } > > > > Andreas > > I'll use > > change <<= (GFS2_SB(inode)->sd_sb.sb_bsize_shift - GFS2_BASIC_BLOCK_SHIFT); > > for consistency with the gfs2_get/set_inode_blocks(). I'll send the patch in a > bit. > > Given what it was like before, either i_blocks was already 0 or -change > somehow became stupidly huge. Anything else looks like it would be hard to > reproduce without mkfs.gfs2 -b 512 (so that GFS2_SB(inode)->sd_sb.sb_bsize == > GFS2_BASIC_BLOCK) which we don't do. > > I'll try to work out what could have caused it and see if we can provoke it > again. > > Out of curiosity I did a few tests where I created a file on GFS2, copied / > dev/null on top of it, and then ran stat on the file. It seems like GFS2 never > frees the last allocation on truncate; stat always reports 1, 2, 4 or 8 blocks > in use for a zero-length file depending on the underlying filesystem block > size, unlike (say) ext3/4 where it reports 0. I presume this is intentional so > maybe some corner case where it *is* trying to do that is the root of the > problem. Yes, that's intentional. An empty file without extended attributes consumes one block. Extended attributes consume at least one additional block. Andreas
Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64
Hi, On 09/01/2019 17:14, Tim Smith wrote: On Wednesday, 9 January 2019 15:35:05 GMT Andreas Gruenbacher wrote: On Wed, 9 Jan 2019 at 14:43, Mark Syms wrote: We don't yet know how the assert got triggered as we've only seen it once and in the original form it looks like it would be very hard to trigger in any normal case (given that in default usage i_blocks should be at least 8 times what any putative value for change could be). So, for the assert to have triggered we've been asked to remove at least 8 times the number of blocks currently allocated to the inode. Possible causes could be a double release or some other higher level bug that will require further investigation to uncover. The following change has at least survived xfstests: --- a/fs/gfs2/inode.h +++ b/fs/gfs2/inode.h @@ -61,8 +61,8 @@ static inline u64 gfs2_get_inode_blocks(const struct inode *inode) static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change) { -gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks > -change)); -change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK); +change <<= inode->i_blkbits - 9; +gfs2_assert(GFS2_SB(inode), change >= 0 || inode->i_blocks >= -change); inode->i_blocks += change; } Andreas I'll use change <<= (GFS2_SB(inode)->sd_sb.sb_bsize_shift - GFS2_BASIC_BLOCK_SHIFT); for consistency with the gfs2_get/set_inode_blocks(). I'll send the patch in a bit. Given what it was like before, either i_blocks was already 0 or -change somehow became stupidly huge. Anything else looks like it would be hard to reproduce without mkfs.gfs2 -b 512 (so that GFS2_SB(inode)->sd_sb.sb_bsize == GFS2_BASIC_BLOCK) which we don't do. I'll try to work out what could have caused it and see if we can provoke it again. Out of curiosity I did a few tests where I created a file on GFS2, copied / dev/null on top of it, and then ran stat on the file. It seems like GFS2 never frees the last allocation on truncate; stat always reports 1, 2, 4 or 8 blocks in use for a zero-length file depending on the underlying filesystem block size, unlike (say) ext3/4 where it reports 0. I presume this is intentional so maybe some corner case where it *is* trying to do that is the root of the problem. That is because gfs2 uses a block for each inode, so it makes sense to account for it in that way. For ext* the inodes are in separate areas of the disk, and they only use up a partial block, and they are also counted separately too. So it is a historical design difference I think, Steve.
Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64
On Wednesday, 9 January 2019 15:35:05 GMT Andreas Gruenbacher wrote: > On Wed, 9 Jan 2019 at 14:43, Mark Syms wrote: > > We don't yet know how the assert got triggered as we've only seen it once > > and in the original form it looks like it would be very hard to trigger in > > any normal case (given that in default usage i_blocks should be at least 8 > > times what any putative value for change could be). So, for the assert to > > have triggered we've been asked to remove at least 8 times the number of > > blocks currently allocated to the inode. Possible causes could be a double > > release or some other higher level bug that will require further > > investigation to uncover. > > The following change has at least survived xfstests: > > --- a/fs/gfs2/inode.h > +++ b/fs/gfs2/inode.h > @@ -61,8 +61,8 @@ static inline u64 gfs2_get_inode_blocks(const struct > inode *inode) > > static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change) > { > -gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks > > -change)); > -change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK); > +change <<= inode->i_blkbits - 9; > +gfs2_assert(GFS2_SB(inode), change >= 0 || inode->i_blocks >= -change); > inode->i_blocks += change; > } > > Andreas I'll use change <<= (GFS2_SB(inode)->sd_sb.sb_bsize_shift - GFS2_BASIC_BLOCK_SHIFT); for consistency with the gfs2_get/set_inode_blocks(). I'll send the patch in a bit. Given what it was like before, either i_blocks was already 0 or -change somehow became stupidly huge. Anything else looks like it would be hard to reproduce without mkfs.gfs2 -b 512 (so that GFS2_SB(inode)->sd_sb.sb_bsize == GFS2_BASIC_BLOCK) which we don't do. I'll try to work out what could have caused it and see if we can provoke it again. Out of curiosity I did a few tests where I created a file on GFS2, copied / dev/null on top of it, and then ran stat on the file. It seems like GFS2 never frees the last allocation on truncate; stat always reports 1, 2, 4 or 8 blocks in use for a zero-length file depending on the underlying filesystem block size, unlike (say) ext3/4 where it reports 0. I presume this is intentional so maybe some corner case where it *is* trying to do that is the root of the problem. -- Tim Smith
Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64
On Wed, 9 Jan 2019 at 14:43, Mark Syms wrote: > We don't yet know how the assert got triggered as we've only seen it once > and in the original form it looks like it would be very hard to trigger in > any normal case (given that in default usage i_blocks should be at least 8 > times what any putative value for change could be). So, for the assert to > have triggered we've been asked to remove at least 8 times the number of > blocks currently allocated to the inode. Possible causes could be a double > release or some other higher level bug that will require further > investigation to uncover. > The following change has at least survived xfstests: --- a/fs/gfs2/inode.h +++ b/fs/gfs2/inode.h @@ -61,8 +61,8 @@ static inline u64 gfs2_get_inode_blocks(const struct inode *inode) static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change) { -gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks > -change)); -change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK); +change <<= inode->i_blkbits - 9; +gfs2_assert(GFS2_SB(inode), change >= 0 || inode->i_blocks >= -change); inode->i_blocks += change; } Andreas
Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64
We don't yet know how the assert got triggered as we've only seen it once and in the original form it looks like it would be very hard to trigger in any normal case (given that in default usage i_blocks should be at least 8 times what any putative value for change could be). So, for the assert to have triggered we've been asked to remove at least 8 times the number of blocks currently allocated to the inode. Possible causes could be a double release or some other higher level bug that will require further investigation to uncover. Mark. -Original Message- From: Andreas Gruenbacher Sent: 09 January 2019 13:31 To: Tim Smith Cc: cluster-devel ; Igor Druzhinin ; Mark Syms Subject: Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64 Mark and Tim, On Wed, 9 Jan 2019 at 13:05, Tim Smith wrote: > On Tuesday, 8 January 2019 13:32:20 GMT Mark Syms wrote: > > Hi, > > > > We've seen this in testing with 4.19. > > > > Full trace is at the bottom. > > > > Looking at the code though it looks like it will assert if the value > > of change is equal to the number of blocks currently allocated to the inode. > > Is this expected or should the assert be using >= instead of > ? > > It's weirder. Looking at > > static inline void gfs2_add_inode_blocks(struct inode *inode, s64 > change) { > gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks > > -change)); > change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK); > inode->i_blocks += change; > } > > where the BUG is happening, "inode->i_blocks" is counting of 512b > blocks and "change" is in units of whatever-the-superblock-said which > will probably be counting 4096b blocks, so the comparison makes no sense. indeed. I wonder why this hasn't been discovered long ago. > It should probably read > > static inline void gfs2_add_inode_blocks(struct inode *inode, s64 > change) { > change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK); > gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks >= > -change)); > inode->i_blocks += change; > } > > so I'll make a patch for that unless someone wants to correct me. You can just shift by (inode->i_blkbits - 9). Can you still trigger the assert with this fix? Thanks, Andreas
Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64
Mark and Tim, On Wed, 9 Jan 2019 at 13:05, Tim Smith wrote: > On Tuesday, 8 January 2019 13:32:20 GMT Mark Syms wrote: > > Hi, > > > > We've seen this in testing with 4.19. > > > > Full trace is at the bottom. > > > > Looking at the code though it looks like it will assert if the value of > > change is equal to the number of blocks currently allocated to the inode. > > Is this expected or should the assert be using >= instead of > ? > > It's weirder. Looking at > > static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change) > { > gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks > > -change)); > change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK); > inode->i_blocks += change; > } > > where the BUG is happening, "inode->i_blocks" is counting of 512b blocks and > "change" is in units of whatever-the-superblock-said which will probably be > counting 4096b blocks, so the comparison makes no sense. indeed. I wonder why this hasn't been discovered long ago. > It should probably read > > static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change) > { > change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK); > gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks >= > -change)); > inode->i_blocks += change; > } > > so I'll make a patch for that unless someone wants to correct me. You can just shift by (inode->i_blkbits - 9). Can you still trigger the assert with this fix? Thanks, Andreas
Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64
On Tuesday, 8 January 2019 13:32:20 GMT Mark Syms wrote: > Hi, > > We've seen this in testing with 4.19. > > Full trace is at the bottom. > > Looking at the code though it looks like it will assert if the value of > change is equal to the number of blocks currently allocated to the inode. > Is this expected or should the assert be using >= instead of > ? It's weirder. Looking at static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change) { gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks > -change)); change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK); inode->i_blocks += change; } where the BUG is happening, "inode->i_blocks" is counting of 512b blocks and "change" is in units of whatever-the-superblock-said which will probably be counting 4096b blocks, so the comparison makes no sense. It should probably read static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change) { change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK); gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks >= -change)); inode->i_blocks += change; } so I'll make a patch for that unless someone wants to correct me. As for why it's triggering, given the current scale difference I can only assume it's an attempt to remove blocks from something which has none left. I'd suspect a boundary case in sweep_bh_for_rgrps(). (It *is* possible currently to remove 1 4096b block from something which has 8 512b blocks so you can get to the state of "inode->i_blocks == 0)" It would only blow up all the time if you did a "mkfs.gfs2 -b 512" which I don't imagine anyone does that often...) > Thanks, > > Mark > > -- > > 2018-12-24 00:07:03 UTC] [ 3366.209273] kernel BUG at fs/gfs2/inode.h:64! [snip] > [2018-12-24 00:07:03 UTC] [ 3366.209802] Call Trace: > [2018-12-24 00:07:03 UTC] [ 3366.209835] punch_hole+0xf4f/0x10d0 [gfs2] > [2018-12-24 00:07:03 UTC] [ 3366.209866] ? __switch_to_asm+0x40/0x70 > [2018-12-24 00:07:03 UTC] [ 3366.209907] ? __switch_to_asm+0x40/0x70 > [2018-12-24 00:07:03 UTC] [ 3366.209924] ? __switch_to_asm+0x40/0x70 > [2018-12-24 00:07:03 UTC] [ 3366.209949] ? punch_hole+0xb3e/0x10d0 [gfs2] > [2018-12-24 00:07:03 UTC] [ 3366.209983] gfs2_evict_inode+0x57b/0x680 [gfs2] > [2018-12-24 00:07:03 UTC] [ 3366.210036] ? > __inode_wait_for_writeback+0x75/0xe0 > [2018-12-24 00:07:03 UTC] [ 3366.210066] ? gfs2_evict_inode+0x1c7/0x680 > [gfs2] > [2018-12-24 00:07:03 UTC] [ 3366.210090] evict+0xc6/0x1a0 > [2018-12-24 00:07:03 UTC] [ 3366.210151] delete_work_func+0x60/0x70 [gfs2] > [2018-12-24 00:07:03 UTC] [ 3366.210177] process_one_work+0x165/0x370 > [2018-12-24 00:07:03 UTC] [ 3366.210197] worker_thread+0x49/0x3e0 > [2018-12-24 00:07:03 UTC] [ 3366.210216] kthread+0xf8/0x130 > [2018-12-24 00:07:03 UTC] [ 3366.210235] ? rescuer_thread+0x310/0x310 > [2018-12-24 00:07:03 UTC] [ 3366.210284] ? kthread_bind+0x10/0x10 > [2018-12-24 00:07:04 UTC] [ 3366.210301] ret_from_fork+0x35/0x40 -- Tim Smith