Re: [Cluster-devel] [PATCH 1/2] gfs2: Fix occasional glock use-after-free

2019-04-09 Thread Andreas Gruenbacher
On Tue, 9 Apr 2019 at 17:37, Ross Lagerwall  wrote:
> On 4/5/19 6:50 PM, Andreas Gruenbacher wrote:
> > Hi Ross,
> >
> > On Tue, 2 Apr 2019 at 00:59, Andreas Gruenbacher  
> > wrote:
> >> thanks for the great analysis.
> >>
> >> On Tue, 26 Mar 2019 at 20:14, Bob Peterson  wrote:
> >>> - Original Message -
>  6. gfs2_log_flush() continues and calls revoke_lo_after_commit() and
>  uses the freed glock (stack trace above).
> 
>  Should evict_inode call gfs2_ail_flush() or something so that the revoke
>  is written before it drops its reference to the glock?
> 
>  Or is there something else that is meant to keep the glock around if the
>  inode is evicted while there is a linked gfs2_bufdata sitting on some
>  AIL list?
> 
>  Let me know if any more specific information is needed since I now have
>  a test setup that can usually reproduce this within 10 minutes.
> >>>
> >>> Very interesting.
> >>>
> >>> It's not unusual for glocks to outlive their inodes, but I'm not sure
> >>> what the right answer is in this case. Either the revoke should
> >>> take an additional reference to the glock, and not let go until the
> >>> revoke is written by some log flush, or else the evict needs to do the
> >>> log flush to make sure the revoke is committed. But we've had issues with
> >>> evict in the past, so we need to be careful about how we fix it.
> >>> Andreas and I will look into the best way to fix it. Thanks again for 
> >>> your help.
> >>
> >> Usually, glocks are attached to inodes or other objects. When the
> >> state of the glock changes, the go_lock, go_sync, and go_inval
> >> operations determine what happens to the attached object. In
> >> gfs2_evict_inode, we disassociate the inode from the glock, do the
> >> necessary cleanup work locally, and put the glock asynchronously when
> >> needed, though. We do that in PF_MEMALLOC context to avoid
> >> deadlocking; see commit 71c1b2136835 ("gfs2: gfs2_evict_inode: Put
> >> glocks asynchronously"). Even if we didn't do that, there could still
> >> be other glock holders like the glock state machine. There is no
> >> guarantee that the glock will go away anytime soon, or actually at
> >> all: it could get reused by another instance of the same inode. So
> >> waiting for the glock to go away first in gfs2_evict_inode definitely
> >> is not an option.
> >>
> >> This basically answers your above question: gfs2_evict_inode should
> >> definitely clean up before putting the glock. I'm just not sure how to
> >> best achieve that, yet.
> >
> > after more discussions, Bob convinced me that it makes perfect sense
> > to not write out outstanding revokes in gfs2_evict_inode. We really
> > want to delay writing revokes as long as possible so that as many of
> > the revokes as possible will go away (either because the blocks are
> > re-added to the journal and the revokes are "unrevoked", or because
> > the journal wraps around). So the right fix here is indeed taking
> > additional glock references.
> >
> > I've come up with a patch that takes one additional reference for each
> > glock instead of one reference for each bufdata object; that should
> > scale better. I'll post that and a follow-up patch separately.
> >
> > Could you please verify?
> >
>
> I've tested the patches and no longer see the use-after-free.

Ok great, we'll line the fix up for the next merge window.

Thanks,
Andreas



Re: [Cluster-devel] [PATCH 1/2] gfs2: Fix occasional glock use-after-free

2019-04-09 Thread Ross Lagerwall

On 4/5/19 6:50 PM, Andreas Gruenbacher wrote:

Hi Ross,

On Tue, 2 Apr 2019 at 00:59, Andreas Gruenbacher  wrote:

thanks for the great analysis.

On Tue, 26 Mar 2019 at 20:14, Bob Peterson  wrote:

- Original Message -

6. gfs2_log_flush() continues and calls revoke_lo_after_commit() and
uses the freed glock (stack trace above).

Should evict_inode call gfs2_ail_flush() or something so that the revoke
is written before it drops its reference to the glock?

Or is there something else that is meant to keep the glock around if the
inode is evicted while there is a linked gfs2_bufdata sitting on some
AIL list?

Let me know if any more specific information is needed since I now have
a test setup that can usually reproduce this within 10 minutes.


Very interesting.

It's not unusual for glocks to outlive their inodes, but I'm not sure
what the right answer is in this case. Either the revoke should
take an additional reference to the glock, and not let go until the
revoke is written by some log flush, or else the evict needs to do the
log flush to make sure the revoke is committed. But we've had issues with
evict in the past, so we need to be careful about how we fix it.
Andreas and I will look into the best way to fix it. Thanks again for your help.


Usually, glocks are attached to inodes or other objects. When the
state of the glock changes, the go_lock, go_sync, and go_inval
operations determine what happens to the attached object. In
gfs2_evict_inode, we disassociate the inode from the glock, do the
necessary cleanup work locally, and put the glock asynchronously when
needed, though. We do that in PF_MEMALLOC context to avoid
deadlocking; see commit 71c1b2136835 ("gfs2: gfs2_evict_inode: Put
glocks asynchronously"). Even if we didn't do that, there could still
be other glock holders like the glock state machine. There is no
guarantee that the glock will go away anytime soon, or actually at
all: it could get reused by another instance of the same inode. So
waiting for the glock to go away first in gfs2_evict_inode definitely
is not an option.

This basically answers your above question: gfs2_evict_inode should
definitely clean up before putting the glock. I'm just not sure how to
best achieve that, yet.


after more discussions, Bob convinced me that it makes perfect sense
to not write out outstanding revokes in gfs2_evict_inode. We really
want to delay writing revokes as long as possible so that as many of
the revokes as possible will go away (either because the blocks are
re-added to the journal and the revokes are "unrevoked", or because
the journal wraps around). So the right fix here is indeed taking
additional glock references.

I've come up with a patch that takes one additional reference for each
glock instead of one reference for each bufdata object; that should
scale better. I'll post that and a follow-up patch separately.

Could you please verify?



I've tested the patches and no longer see the use-after-free.

Thanks for working on this!

--
Ross Lagerwall



Re: [Cluster-devel] [PATCH 02/19] gfs2: Introduce concept of a pending withdraw

2019-04-09 Thread Andreas Gruenbacher
Hi Bob,

On Wed, 27 Mar 2019 at 13:35, Bob Peterson  wrote:
> File system withdraws can be delayed when inconsistencies are
> discovered when we cannot withdraw immediately, for example, when
> critical spin_locks are held. But delaying the withdraw can cause
> gfs2 to ignore the error and keep running for a short period of time.
> For example, an rgrp glock may be dequeued and demoted while there
> are still buffers that haven't been properly revoked, due to io
> errors writing to the journal.
>
> This patch introduces a new concept of a delayed withdraw, which
> means an inconsistency has been discovered and we need to withdraw
> at the earliest possible opportunity. In these cases, we aren't
> quite withdrawn yet, but we still need to not dequeue glocks and
> other critical things. If we dequeue the glocks and the withdraw
> results in our journal being replayed, the replay could overwrite
> data that's been modified by a different node that acquired the
> glock in the meantime.
>
> Signed-off-by: Bob Peterson 
> ---
>  fs/gfs2/aops.c   |  4 ++--
>  fs/gfs2/file.c   |  2 +-
>  fs/gfs2/glock.c  |  7 +++
>  fs/gfs2/glops.c  |  2 +-
>  fs/gfs2/incore.h |  1 +
>  fs/gfs2/log.c| 20 
>  fs/gfs2/meta_io.c|  6 +++---
>  fs/gfs2/ops_fstype.c |  3 +--
>  fs/gfs2/quota.c  |  2 +-
>  fs/gfs2/super.c  |  6 +++---
>  fs/gfs2/sys.c|  2 +-
>  fs/gfs2/util.c   |  1 +
>  fs/gfs2/util.h   |  8 
>  13 files changed, 34 insertions(+), 30 deletions(-)
>
> diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
> index 05dd78f4b2b3..0d3cde8a61cd 100644
> --- a/fs/gfs2/aops.c
> +++ b/fs/gfs2/aops.c
> @@ -521,7 +521,7 @@ static int __gfs2_readpage(void *file, struct page *page)
> error = mpage_readpage(page, gfs2_block_map);
> }
>
> -   if (unlikely(test_bit(SDF_SHUTDOWN, >sd_flags)))
> +   if (unlikely(withdrawn(sdp)))
> return -EIO;
>
> return error;
> @@ -638,7 +638,7 @@ static int gfs2_readpages(struct file *file, struct 
> address_space *mapping,
> gfs2_glock_dq();
>  out_uninit:
> gfs2_holder_uninit();
> -   if (unlikely(test_bit(SDF_SHUTDOWN, >sd_flags)))
> +   if (unlikely(withdrawn(sdp)))
> ret = -EIO;
> return ret;
>  }
> diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
> index 58a768e59712..2a3ac9747d0d 100644
> --- a/fs/gfs2/file.c
> +++ b/fs/gfs2/file.c
> @@ -1169,7 +1169,7 @@ static int gfs2_lock(struct file *file, int cmd, struct 
> file_lock *fl)
> cmd = F_SETLK;
> fl->fl_type = F_UNLCK;
> }
> -   if (unlikely(test_bit(SDF_SHUTDOWN, >sd_flags))) {
> +   if (unlikely(withdrawn(sdp))) {
> if (fl->fl_type == F_UNLCK)
> locks_lock_file_wait(file, fl);
> return -EIO;
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index d32964cd1117..4330164de8bd 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -542,7 +542,7 @@ __acquires(>gl_lockref.lock)
> unsigned int lck_flags = (unsigned int)(gh ? gh->gh_flags : 0);
> int ret;
>
> -   if (unlikely(test_bit(SDF_SHUTDOWN, >sd_flags)) &&
> +   if (unlikely(withdrawn(sdp)) &&
> target != LM_ST_UNLOCKED)
> return;
> lck_flags &= (LM_FLAG_TRY | LM_FLAG_TRY_1CB | LM_FLAG_NOEXP |
> @@ -579,8 +579,7 @@ __acquires(>gl_lockref.lock)
> }
> else if (ret) {
> fs_err(sdp, "lm_lock ret %d\n", ret);
> -   GLOCK_BUG_ON(gl, !test_bit(SDF_SHUTDOWN,
> -  >sd_flags));
> +   GLOCK_BUG_ON(gl, !withdrawn(sdp));
> }
> } else { /* lock_nolock */
> finish_xmote(gl, target);
> @@ -1092,7 +1091,7 @@ int gfs2_glock_nq(struct gfs2_holder *gh)
> struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
> int error = 0;
>
> -   if (unlikely(test_bit(SDF_SHUTDOWN, >sd_flags)))
> +   if (unlikely(withdrawn(sdp)))
> return -EIO;
>
> if (test_bit(GLF_LRU, >gl_flags))
> diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
> index 78510ab91835..719961b5a511 100644
> --- a/fs/gfs2/glops.c
> +++ b/fs/gfs2/glops.c
> @@ -538,7 +538,7 @@ static int freeze_go_xmote_bh(struct gfs2_glock *gl, 
> struct gfs2_holder *gh)
> gfs2_consist(sdp);
>
> /*  Initialize some head of the log stuff  */
> -   if (!test_bit(SDF_SHUTDOWN, >sd_flags)) {
> +   if (!withdrawn(sdp)) {
> sdp->sd_log_sequence = head.lh_sequence + 1;
> gfs2_log_pointers_init(sdp, head.lh_blkno);
> }
> diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
> index 76336b592030..c6984265807f 100644
> --- a/fs/gfs2/incore.h
> +++ b/fs/gfs2/incore.h
> @@ -620,6 +620,7 @@ enum {
> 

Re: [Cluster-devel] [PATCH 01/19] gfs2: log error reform

2019-04-09 Thread Andreas Gruenbacher
Bob,

On Wed, 27 Mar 2019 at 13:36, Bob Peterson  wrote:
> Before this patch, gfs2 kept track of journal io errors in two
> places (sd_log_error and the SDF_AIL1_IO_ERROR flag in sd_flags.
> This patch consolidates the two by eliminating the SDF_AIL1_IO_ERROR
> flag in favor of an atomic count of journal errors, sd_log_errors.
> When the first io error occurs and its value is incremented to 1,
> sd_log_error is set. Thus, sd_log_error reflects the first error
> we encountered writing to the journal. In future patches, we will
> take advantage of this by checking a single value (sd_log_errors)
> rather than having to check both the flag and the sd_log_error
> when reacting to io errors.
>
> Signed-off-by: Bob Peterson 
> ---
>  fs/gfs2/incore.h | 4 ++--
>  fs/gfs2/log.c| 7 ---
>  fs/gfs2/lops.c   | 7 +--
>  fs/gfs2/ops_fstype.c | 1 +
>  fs/gfs2/quota.c  | 6 --
>  5 files changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
> index cdf07b408f54..76336b592030 100644
> --- a/fs/gfs2/incore.h
> +++ b/fs/gfs2/incore.h
> @@ -620,7 +620,6 @@ enum {
> SDF_RORECOVERY  = 7, /* read only recovery */
> SDF_SKIP_DLM_UNLOCK = 8,
> SDF_FORCE_AIL_FLUSH = 9,
> -   SDF_AIL1_IO_ERROR   = 10,
>  };
>
>  enum gfs2_freeze_state {
> @@ -829,7 +828,8 @@ struct gfs2_sbd {
> atomic_t sd_log_in_flight;
> struct bio *sd_log_bio;
> wait_queue_head_t sd_log_flush_wait;
> -   int sd_log_error;
> +   int sd_log_error; /* First log error */
> +   atomic_t sd_log_errors; /* Count of log errors */
>
> atomic_t sd_reserving_log;
> wait_queue_head_t sd_reserving_log_wait;
> diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
> index b8830fda51e8..0717b4d4828b 100644
> --- a/fs/gfs2/log.c
> +++ b/fs/gfs2/log.c
> @@ -109,8 +109,8 @@ __acquires(>sd_ail_lock)
>
> if (!buffer_busy(bh)) {
> if (!buffer_uptodate(bh) &&
> -   !test_and_set_bit(SDF_AIL1_IO_ERROR,
> - >sd_flags)) {
> +   atomic_add_return(1, >sd_log_errors) == 1) {
> +   sdp->sd_log_error = -EIO;
> gfs2_io_error_bh(sdp, bh);
> *withdraw = true;
> }
> @@ -209,7 +209,8 @@ static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, 
> struct gfs2_trans *tr,
> if (buffer_busy(bh))
> continue;
> if (!buffer_uptodate(bh) &&
> -   !test_and_set_bit(SDF_AIL1_IO_ERROR, >sd_flags)) {
> +   atomic_add_return(1, >sd_log_errors) == 1) {
> +   sdp->sd_log_error = -EIO;
> gfs2_io_error_bh(sdp, bh);
> *withdraw = true;
> }
> diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
> index 8722c60b11fe..627738cfe6bb 100644
> --- a/fs/gfs2/lops.c
> +++ b/fs/gfs2/lops.c
> @@ -211,8 +211,11 @@ static void gfs2_end_log_write(struct bio *bio)
> struct bvec_iter_all iter_all;
>
> if (bio->bi_status) {
> -   fs_err(sdp, "Error %d writing to journal, jid=%u\n",
> -  bio->bi_status, sdp->sd_jdesc->jd_jid);
> +   if (atomic_add_return(1, >sd_log_errors) == 1) {
> +   sdp->sd_log_error = bio->bi_status;
> +   fs_err(sdp, "Error %d writing to journal, jid=%u\n",
> +  bio->bi_status, sdp->sd_jdesc->jd_jid);
> +   }
> wake_up(>sd_logd_waitq);
> }
>
> diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
> index b041cb8ae383..77610be6c918 100644
> --- a/fs/gfs2/ops_fstype.c
> +++ b/fs/gfs2/ops_fstype.c
> @@ -129,6 +129,7 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb)
>
> init_rwsem(>sd_log_flush_lock);
> atomic_set(>sd_log_in_flight, 0);
> +   atomic_set(>sd_log_errors, 0);
> atomic_set(>sd_reserving_log, 0);
> init_waitqueue_head(>sd_reserving_log_wait);
> init_waitqueue_head(>sd_log_flush_wait);
> diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
> index 2ae5a109eea7..009172ef4dfe 100644
> --- a/fs/gfs2/quota.c
> +++ b/fs/gfs2/quota.c
> @@ -1479,8 +1479,10 @@ static void quotad_error(struct gfs2_sbd *sdp, const 
> char *msg, int error)
> if (error == 0 || error == -EROFS)
> return;
> if (!test_bit(SDF_SHUTDOWN, >sd_flags)) {
> -   fs_err(sdp, "gfs2_quotad: %s error %d\n", msg, error);
> -   sdp->sd_log_error = error;
> +   if (atomic_add_return(1, >sd_log_errors) == 1) {
> +   sdp->sd_log_error = error;
> +   fs_err(sdp, "gfs2_quotad: %s error %d\n", msg, error);
> +   }
> wake_up(>sd_logd_waitq);
> }

Re: [Cluster-devel] [PATCH] libgfs2: Import gfs2_ondisk.h

2019-04-09 Thread Andrew Price

On 09/04/2019 14:03, Steven Whitehouse wrote:

On 09/04/2019 13:48, Andrew Price wrote:

On 09/04/2019 13:21, Steven Whitehouse wrote:
Those conversion functions are not sensible, thats why we got rid of 
them from the kernel code. 


Is it the functions that aren't sensible or the use of the 
gfs2_ondisk.h structs as the containers for the native endian data? 
I'm not sure I get why the kernel functions like gfs2_dinode_in() are 
considered sensible and gfs2-utils' gfs2_dinode_in(), which does a 
similar thing but with a different struct, isn't sensible.


Well in general we don't want to convert lots of fields in what is 
basically a copy. The inode, when it is read in is an exception to that 
mainly because we have to in order to make sure that the vfs level data 
is all up to date. Keeping the structs as containers is useful, so yes 
we want to retain that. In many cases though we only need a few fields 
from what can be quite large data structures, so in those cases we 
should read/update the fields that we care about for that particular 
operation, rather than converting the whole data structure each time. We 
got a fair speed up when we made that change in the kernel.


So generally I'd like to discourage the blanket conversion functions, 
though it is likely we'll need to retain a few of them, in favour of 
converting just the required fields at the point of use. This should be 
safe to do given that we have the ability to do compile time type 
checking - and lets try and include that in the tests that are always 
run before check in, to make sure that we don't land up with any 
mistakes. That would be a good addition to the tests I think,


Ah ok, that makes sense to me, thanks for explaining. I'm sure we could 
speed up bits of gfs2-utils by taking that approach too.


Andy



Re: [Cluster-devel] [PATCH] libgfs2: Import gfs2_ondisk.h

2019-04-09 Thread Steven Whitehouse

Hi,

On 09/04/2019 13:48, Andrew Price wrote:



On 09/04/2019 13:21, Steven Whitehouse wrote:

Hi,

On 09/04/2019 13:18, Andrew Price wrote:

On 09/04/2019 13:03, Christoph Hellwig wrote:

On Tue, Apr 09, 2019 at 10:41:53AM +0100, Andrew Price wrote:

Give gfs2-utils its own copy of gfs2_ondisk.h which uses userspace
types. This allows us to always support the latest ondisk 
structures and

obsoletes a lot of #ifdef GFS2_HAS_ blocks and configure.ac
checks.

gfs2_ondisk.h was changed simply by search-and-replace of the 
kernel int

types with the uintN_t, i.e.:

:%s/__u\(8\|16\|32\|64\)/uint\1_t/g
:%s/__be\(64\|32\|16\|8\)/uint\1_t/g

and the linux/types.h include replaced with stdint.h


Why?


Because I'd like to be able to build gfs2-utils on FreeBSD one day. 
Plus we get the handy stuff in inttypes.h to use, Linux doesn't have 
that.



At least the be types give you really useful type checking with
sparse, which can be trivially wired up in userspace as well.


If you mean the bitwise annotations that only sparse checks, we're 
fairly safe in gfs2-utils in that anything represented by a struct 
is going to have been parsed through one of the libgfs2/ondisk.c 
functions so will be the right endianness. I run sparse over this 
code very rarely anyway.


Those conversion functions are not sensible, thats why we got rid of 
them from the kernel code. 


Is it the functions that aren't sensible or the use of the 
gfs2_ondisk.h structs as the containers for the native endian data? 
I'm not sure I get why the kernel functions like gfs2_dinode_in() are 
considered sensible and gfs2-utils' gfs2_dinode_in(), which does a 
similar thing but with a different struct, isn't sensible.


Well in general we don't want to convert lots of fields in what is 
basically a copy. The inode, when it is read in is an exception to that 
mainly because we have to in order to make sure that the vfs level data 
is all up to date. Keeping the structs as containers is useful, so yes 
we want to retain that. In many cases though we only need a few fields 
from what can be quite large data structures, so in those cases we 
should read/update the fields that we care about for that particular 
operation, rather than converting the whole data structure each time. We 
got a fair speed up when we made that change in the kernel.


So generally I'd like to discourage the blanket conversion functions, 
though it is likely we'll need to retain a few of them, in favour of 
converting just the required fields at the point of use. This should be 
safe to do given that we have the ability to do compile time type 
checking - and lets try and include that in the tests that are always 
run before check in, to make sure that we don't land up with any 
mistakes. That would be a good addition to the tests I think,


Steve.




Re: [Cluster-devel] [PATCH] libgfs2: Import gfs2_ondisk.h

2019-04-09 Thread Andrew Price




On 09/04/2019 13:21, Steven Whitehouse wrote:

Hi,

On 09/04/2019 13:18, Andrew Price wrote:

On 09/04/2019 13:03, Christoph Hellwig wrote:

On Tue, Apr 09, 2019 at 10:41:53AM +0100, Andrew Price wrote:

Give gfs2-utils its own copy of gfs2_ondisk.h which uses userspace
types. This allows us to always support the latest ondisk structures 
and

obsoletes a lot of #ifdef GFS2_HAS_ blocks and configure.ac
checks.

gfs2_ondisk.h was changed simply by search-and-replace of the kernel 
int

types with the uintN_t, i.e.:

:%s/__u\(8\|16\|32\|64\)/uint\1_t/g
:%s/__be\(64\|32\|16\|8\)/uint\1_t/g

and the linux/types.h include replaced with stdint.h


Why?


Because I'd like to be able to build gfs2-utils on FreeBSD one day. 
Plus we get the handy stuff in inttypes.h to use, Linux doesn't have 
that.



At least the be types give you really useful type checking with
sparse, which can be trivially wired up in userspace as well.


If you mean the bitwise annotations that only sparse checks, we're 
fairly safe in gfs2-utils in that anything represented by a struct is 
going to have been parsed through one of the libgfs2/ondisk.c 
functions so will be the right endianness. I run sparse over this code 
very rarely anyway.


Those conversion functions are not sensible, thats why we got rid of 
them from the kernel code. 


Is it the functions that aren't sensible or the use of the gfs2_ondisk.h 
structs as the containers for the native endian data? I'm not sure I get 
why the kernel functions like gfs2_dinode_in() are considered sensible 
and gfs2-utils' gfs2_dinode_in(), which does a similar thing but with a 
different struct, isn't sensible.


It is better to have a set of types that have 
the endianess specified so that we can use sparse. Compile time checking 
is always a good plan where it is possible.


Okay, I'll add back the bitwise annotations through typedefs to stdint.h 
types in a new header but I don't want to name it linux/types.h to avoid 
picking up the wrong one, so I'll just change that #include in 
gfs2_ondisk.h.


Andy



Re: [Cluster-devel] [PATCH] libgfs2: Import gfs2_ondisk.h

2019-04-09 Thread Christoph Hellwig
On Tue, Apr 09, 2019 at 01:21:23PM +0100, Steven Whitehouse wrote:
> Those conversion functions are not sensible, thats why we got rid of them
> from the kernel code. It is better to have a set of types that have the
> endianess specified so that we can use sparse. Compile time checking is
> always a good plan where it is possible.

Yeah.  And  vs inttypes.h is no argument either,
you can define the __be types based on the inttypes.h types (which
really are stdint.h ones anyway).



Re: [Cluster-devel] gfs2 iomap dealock, IOMAP_F_UNBALANCED

2019-04-09 Thread Andreas Gruenbacher
On Tue, 9 Apr 2019 at 14:15, Christoph Hellwig  wrote:
> On Mon, Apr 08, 2019 at 03:44:05PM +0200, Jan Kara wrote:
> > > We won't be able to do a log flush while another transaction is
> > > active, but that's what's needed to clean dirty pages. iomap doesn't
> > > allow us to put the block allocation into a separate transaction from
> > > the page writes; for that, the opposite to the page_done hook would
> > > probably be needed.
> >
> > I agree that a ->page_prepare() hook would be probably the cleanest
> > solution for this.
>
> That doesn't sound too bad to me.

Okay, I'll see how the code for that will turn out.

Thanks,
Andreas



Re: [Cluster-devel] [PATCH] libgfs2: Import gfs2_ondisk.h

2019-04-09 Thread Steven Whitehouse

Hi,

On 09/04/2019 13:18, Andrew Price wrote:

On 09/04/2019 13:03, Christoph Hellwig wrote:

On Tue, Apr 09, 2019 at 10:41:53AM +0100, Andrew Price wrote:

Give gfs2-utils its own copy of gfs2_ondisk.h which uses userspace
types. This allows us to always support the latest ondisk structures 
and

obsoletes a lot of #ifdef GFS2_HAS_ blocks and configure.ac
checks.

gfs2_ondisk.h was changed simply by search-and-replace of the kernel 
int

types with the uintN_t, i.e.:

:%s/__u\(8\|16\|32\|64\)/uint\1_t/g
:%s/__be\(64\|32\|16\|8\)/uint\1_t/g

and the linux/types.h include replaced with stdint.h


Why?


Because I'd like to be able to build gfs2-utils on FreeBSD one day. 
Plus we get the handy stuff in inttypes.h to use, Linux doesn't have 
that.



At least the be types give you really useful type checking with
sparse, which can be trivially wired up in userspace as well.


If you mean the bitwise annotations that only sparse checks, we're 
fairly safe in gfs2-utils in that anything represented by a struct is 
going to have been parsed through one of the libgfs2/ondisk.c 
functions so will be the right endianness. I run sparse over this code 
very rarely anyway.


Those conversion functions are not sensible, thats why we got rid of 
them from the kernel code. It is better to have a set of types that have 
the endianess specified so that we can use sparse. Compile time checking 
is always a good plan where it is possible.






Also
keeping the file 1:1 the same is going to make your life much easier
in the future..


It's really no difficulty to run the above substitutions the next time 
the file changes, but gfs2_ondisk.h changes once in a blue moon anyway 
so the maintenance overhead is going to be tiny.


Andy


Thats true, but lets keep the ability to do endianess checks,

Steve.




Re: [Cluster-devel] [PATCH] libgfs2: Import gfs2_ondisk.h

2019-04-09 Thread Andrew Price

On 09/04/2019 13:03, Christoph Hellwig wrote:

On Tue, Apr 09, 2019 at 10:41:53AM +0100, Andrew Price wrote:

Give gfs2-utils its own copy of gfs2_ondisk.h which uses userspace
types. This allows us to always support the latest ondisk structures and
obsoletes a lot of #ifdef GFS2_HAS_ blocks and configure.ac
checks.

gfs2_ondisk.h was changed simply by search-and-replace of the kernel int
types with the uintN_t, i.e.:

:%s/__u\(8\|16\|32\|64\)/uint\1_t/g
:%s/__be\(64\|32\|16\|8\)/uint\1_t/g

and the linux/types.h include replaced with stdint.h


Why?


Because I'd like to be able to build gfs2-utils on FreeBSD one day. Plus 
we get the handy stuff in inttypes.h to use, Linux doesn't have that.



At least the be types give you really useful type checking with
sparse, which can be trivially wired up in userspace as well.


If you mean the bitwise annotations that only sparse checks, we're 
fairly safe in gfs2-utils in that anything represented by a struct is 
going to have been parsed through one of the libgfs2/ondisk.c functions 
so will be the right endianness. I run sparse over this code very rarely 
anyway.



Also
keeping the file 1:1 the same is going to make your life much easier
in the future..


It's really no difficulty to run the above substitutions the next time 
the file changes, but gfs2_ondisk.h changes once in a blue moon anyway 
so the maintenance overhead is going to be tiny.


Andy



Re: [Cluster-devel] gfs2 iomap dealock, IOMAP_F_UNBALANCED

2019-04-09 Thread Christoph Hellwig
On Mon, Apr 08, 2019 at 03:44:05PM +0200, Jan Kara wrote:
> > We won't be able to do a log flush while another transaction is
> > active, but that's what's needed to clean dirty pages. iomap doesn't
> > allow us to put the block allocation into a separate transaction from
> > the page writes; for that, the opposite to the page_done hook would
> > probably be needed.
> 
> I agree that a ->page_prepare() hook would be probably the cleanest
> solution for this.

That doesn't sound too bad to me.



Re: [Cluster-devel] [PATCH] libgfs2: Import gfs2_ondisk.h

2019-04-09 Thread Christoph Hellwig
On Tue, Apr 09, 2019 at 10:41:53AM +0100, Andrew Price wrote:
> Give gfs2-utils its own copy of gfs2_ondisk.h which uses userspace
> types. This allows us to always support the latest ondisk structures and
> obsoletes a lot of #ifdef GFS2_HAS_ blocks and configure.ac
> checks.
> 
> gfs2_ondisk.h was changed simply by search-and-replace of the kernel int
> types with the uintN_t, i.e.:
> 
> :%s/__u\(8\|16\|32\|64\)/uint\1_t/g
> :%s/__be\(64\|32\|16\|8\)/uint\1_t/g
> 
> and the linux/types.h include replaced with stdint.h

Why?  At least the be types give you really useful type checking with
sparse, which can be trivially wired up in userspace as well.  Also
keeping the file 1:1 the same is going to make your life much easier
in the future..



[Cluster-devel] [PATCH] libgfs2: Import gfs2_ondisk.h

2019-04-09 Thread Andrew Price
Give gfs2-utils its own copy of gfs2_ondisk.h which uses userspace
types. This allows us to always support the latest ondisk structures and
obsoletes a lot of #ifdef GFS2_HAS_ blocks and configure.ac
checks.

gfs2_ondisk.h was changed simply by search-and-replace of the kernel int
types with the uintN_t, i.e.:

:%s/__u\(8\|16\|32\|64\)/uint\1_t/g
:%s/__be\(64\|32\|16\|8\)/uint\1_t/g

and the linux/types.h include replaced with stdint.h

A bunch of format strings were updated to use PRI* from inttypes.h to
fix issues where the kernel types were typedef'd to long long and the
userspace types are just long, for instance, despite being the same
width on x86_64.

Signed-off-by: Andrew Price 
---
 configure.ac   |  16 -
 gfs2/convert/gfs2_convert.c|   3 +-
 gfs2/edit/extended.c   |  41 +--
 gfs2/edit/gfs2hex.c|  19 +-
 gfs2/edit/hexedit.c|   3 +-
 gfs2/edit/hexedit.h|   2 +-
 gfs2/edit/journal.c|  13 +-
 gfs2/edit/savemeta.c   |   3 +-
 gfs2/fsck/fs_recovery.c|  30 +-
 gfs2/fsck/initialize.c |  50 ++-
 gfs2/fsck/lost_n_found.c   |  12 +-
 gfs2/fsck/pass1.c  |   2 +-
 gfs2/fsck/pass2.c  |   2 +-
 gfs2/fsck/rgrepair.c   |  29 +-
 gfs2/include/gfs2_ondisk.h | 535 +
 gfs2/libgfs2/buf.c |   1 -
 gfs2/libgfs2/device_geometry.c |   1 -
 gfs2/libgfs2/fs_geometry.c |   1 -
 gfs2/libgfs2/fs_ops.c  |   7 -
 gfs2/libgfs2/gfs1.c|   3 +-
 gfs2/libgfs2/lang.c|  11 +-
 gfs2/libgfs2/libgfs2.h | 103 +++
 gfs2/libgfs2/meta.c|  31 +-
 gfs2/libgfs2/ondisk.c  | 261 ++--
 gfs2/libgfs2/recovery.c|   2 -
 gfs2/libgfs2/rgrp.c|   8 -
 gfs2/libgfs2/structures.c  |  22 +-
 gfs2/mkfs/gfs2_mkfs.h  |   2 +-
 gfs2/mkfs/main_grow.c  |   1 -
 gfs2/mkfs/main_jadd.c  |   7 -
 gfs2/mkfs/main_mkfs.c  |  14 +-
 gfs2/tune/super.c  |  18 +-
 32 files changed, 770 insertions(+), 483 deletions(-)
 create mode 100644 gfs2/include/gfs2_ondisk.h

diff --git a/configure.ac b/configure.ac
index 0c1b0192..51e0cc0a 100644
--- a/configure.ac
+++ b/configure.ac
@@ -126,23 +126,7 @@ AC_SUBST([udevdir], [$with_udevdir])
 # Checks for header files.
 AC_CHECK_HEADERS([fcntl.h libintl.h limits.h locale.h mntent.h stddef.h 
sys/file.h sys/ioctl.h sys/mount.h sys/time.h sys/vfs.h syslog.h termios.h])
 AC_CHECK_HEADER([linux/fs.h], [], [AC_MSG_ERROR([Unable to find linux/fs.h])])
-AC_CHECK_HEADER([linux/types.h], [], [AC_MSG_ERROR([Unable to find 
linux/types.h])])
 AC_CHECK_HEADER([linux/limits.h], [], [AC_MSG_ERROR([Unable to find 
linux/limits.h])])
-AC_CHECK_HEADER([linux/gfs2_ondisk.h], [], [AC_MSG_ERROR([Unable to find 
linux/gfs2_ondisk.h])])
-AC_CHECK_MEMBER([struct gfs2_sb.sb_uuid], [sb_has_uuid=yes], [sb_has_uuid=no],
-[[#include ]])
-AC_CHECK_MEMBER([struct 
gfs2_leaf.lf_inode],[AC_DEFINE([GFS2_HAS_LEAF_HINTS],[],[Leaf block hints])],
-[], [[#include ]])
-AC_CHECK_MEMBER([struct 
gfs2_dirent.de_rahead],[AC_DEFINE([GFS2_HAS_DE_RAHEAD],[],[Dirent readahead 
field])],
-[], [[#include ]])
-AC_CHECK_MEMBER([struct 
gfs2_dirent.de_cookie],[AC_DEFINE([GFS2_HAS_DE_COOKIE],[],[Dirent cookie 
field])],
-[], [[#include ]])
-AC_CHECK_MEMBER([struct 
gfs2_rgrp.rg_skip],[AC_DEFINE([GFS2_HAS_RG_SKIP],[],[Next resource group 
pointer])],
-[], [[#include ]])
-AC_CHECK_MEMBER([struct 
gfs2_rgrp.rg_data0],[AC_DEFINE([GFS2_HAS_RG_RI_FIELDS],[],[Resource group 
fields duplicated from the rindex])],
-[], [[#include ]])
-AC_CHECK_MEMBER([struct 
gfs2_log_header.lh_crc],[AC_DEFINE([GFS2_HAS_LH_V2],[],[v2 log header format])],
-[], [[#include ]])
 
 # libuuid is only required if struct gfs2_sb.sb_uuid exists
 if test "$sb_has_uuid" = "yes" -a "$have_uuid" = "no"; then
diff --git a/gfs2/convert/gfs2_convert.c b/gfs2/convert/gfs2_convert.c
index 9cf97b65..558a935d 100644
--- a/gfs2/convert/gfs2_convert.c
+++ b/gfs2/convert/gfs2_convert.c
@@ -28,8 +28,7 @@
 #include 
 #define _(String) gettext(String)
 
-#include 
-#include 
+#include 
 #include 
 #include "osi_list.h"
 #include "copyright.cf"
diff --git a/gfs2/edit/extended.c b/gfs2/edit/extended.c
index d24d7550..c46e522e 100644
--- a/gfs2/edit/extended.c
+++ b/gfs2/edit/extended.c
@@ -6,7 +6,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -19,7 +18,7 @@
 #include 
 #include 
 
-#include 
+#include 
 #include "copyright.cf"
 
 #include "hexedit.h"
@@ -233,7 +232,7 @@ static int display_indirect(struct iinfo *ind, int 
indblocks, int level,
return 0;
 }
 
-static void print_inode_type(__be16 de_type)
+static void print_inode_type(uint16_t de_type)
 {
if (sbd.gfs1) {
switch(de_type) {
@@ -298,15 +297,6 @@ static void