Re: [Cluster-devel] [GFS2 PATCH v3 05/19] gfs2: Introduce concept of a pending withdraw
Bob, On Wed, 1 May 2019 at 01:03, Bob Peterson wrote: > File system withdraws can be delayed when inconsistencies are > discovered when we cannot withdraw immediately, for example, when > critical spin_locks are held. But delaying the withdraw can cause > gfs2 to ignore the error and keep running for a short period of time. > For example, an rgrp glock may be dequeued and demoted while there > are still buffers that haven't been properly revoked, due to io > errors writing to the journal. > > This patch introduces a new concept of a pending withdraw, which > means an inconsistency has been discovered and we need to withdraw > at the earliest possible opportunity. In these cases, we aren't > quite withdrawn yet, but we still need to not dequeue glocks and > other critical things. If we dequeue the glocks and the withdraw > results in our journal being replayed, the replay could overwrite > data that's been modified by a different node that acquired the > glock in the meantime. this is looking good. Maybe we can improve our terminology a bit though: how about SDF_FAULTY instead of SDF_WITHDRAW? Also, SDF_SHUTDOWN really indicates that a filesystem is being or has been withdrawn, so maybe that should really be called SDF_WITHDRAW? Thanks, Andreas > Signed-off-by: Bob Peterson > --- > fs/gfs2/aops.c | 4 ++-- > fs/gfs2/file.c | 2 +- > fs/gfs2/glock.c | 7 +++ > fs/gfs2/glops.c | 2 +- > fs/gfs2/incore.h | 1 + > fs/gfs2/log.c| 20 > fs/gfs2/meta_io.c| 6 +++--- > fs/gfs2/ops_fstype.c | 3 +-- > fs/gfs2/quota.c | 2 +- > fs/gfs2/super.c | 6 +++--- > fs/gfs2/sys.c| 2 +- > fs/gfs2/util.h | 12 > 12 files changed, 37 insertions(+), 30 deletions(-) > > diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c > index 05dd78f4b2b3..2bacce032a34 100644 > --- a/fs/gfs2/aops.c > +++ b/fs/gfs2/aops.c > @@ -521,7 +521,7 @@ static int __gfs2_readpage(void *file, struct page *page) > error = mpage_readpage(page, gfs2_block_map); > } > > - if (unlikely(test_bit(SDF_SHUTDOWN, >sd_flags))) > + if (unlikely(gfs2_withdrawn(sdp))) > return -EIO; > > return error; > @@ -638,7 +638,7 @@ static int gfs2_readpages(struct file *file, struct > address_space *mapping, > gfs2_glock_dq(); > out_uninit: > gfs2_holder_uninit(); > - if (unlikely(test_bit(SDF_SHUTDOWN, >sd_flags))) > + if (unlikely(gfs2_withdrawn(sdp))) > ret = -EIO; > return ret; > } > diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c > index 58a768e59712..bba39c73a43c 100644 > --- a/fs/gfs2/file.c > +++ b/fs/gfs2/file.c > @@ -1169,7 +1169,7 @@ static int gfs2_lock(struct file *file, int cmd, struct > file_lock *fl) > cmd = F_SETLK; > fl->fl_type = F_UNLCK; > } > - if (unlikely(test_bit(SDF_SHUTDOWN, >sd_flags))) { > + if (unlikely(gfs2_withdrawn(sdp))) { > if (fl->fl_type == F_UNLCK) > locks_lock_file_wait(file, fl); > return -EIO; > diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c > index 15c605cfcfc8..fe9f9a22426e 100644 > --- a/fs/gfs2/glock.c > +++ b/fs/gfs2/glock.c > @@ -547,7 +547,7 @@ __acquires(>gl_lockref.lock) > unsigned int lck_flags = (unsigned int)(gh ? gh->gh_flags : 0); > int ret; > > - if (unlikely(test_bit(SDF_SHUTDOWN, >sd_flags)) && > + if (unlikely(gfs2_withdrawn(sdp)) && > target != LM_ST_UNLOCKED) > return; > lck_flags &= (LM_FLAG_TRY | LM_FLAG_TRY_1CB | LM_FLAG_NOEXP | > @@ -584,8 +584,7 @@ __acquires(>gl_lockref.lock) > } > else if (ret) { > fs_err(sdp, "lm_lock ret %d\n", ret); > - GLOCK_BUG_ON(gl, !test_bit(SDF_SHUTDOWN, > - >sd_flags)); > + GLOCK_BUG_ON(gl, !gfs2_withdrawn(sdp)); > } > } else { /* lock_nolock */ > finish_xmote(gl, target); > @@ -1097,7 +1096,7 @@ int gfs2_glock_nq(struct gfs2_holder *gh) > struct gfs2_sbd *sdp = gl->gl_name.ln_sbd; > int error = 0; > > - if (unlikely(test_bit(SDF_SHUTDOWN, >sd_flags))) > + if (unlikely(gfs2_withdrawn(sdp))) > return -EIO; > > if (test_bit(GLF_LRU, >gl_flags)) > diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c > index 78510ab91835..b9f36a85a4d4 100644 > --- a/fs/gfs2/glops.c > +++ b/fs/gfs2/glops.c > @@ -538,7 +538,7 @@ static int freeze_go_xmote_bh(struct gfs2_glock *gl, > struct gfs2_holder *gh) > gfs2_consist(sdp); > > /* Initialize some head of the log stuff */ > - if (!test_bit(SDF_SHUTDOWN, >sd_flags)) { > + if (!gfs2_withdrawn(sdp)) { > sdp->sd_log_sequence = head.lh_sequence + 1;
[Cluster-devel] [GIT PULL] iomap: cleanups and enhancements for 5.2
Hi Linus, Here are some patches for the iomap code for 5.2. Nothing particularly exciting here, just adding some callouts for gfs2 and cleaning a few things. It merges cleanly against this morning's HEAD and survived an overnight run of xfstests. Let me know if you run into anything weird. --D The following changes since commit dc4060a5dc2557e6b5aa813bf5b73677299d62d2: Linux 5.1-rc5 (2019-04-14 15:17:41 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git tags/iomap-5.2-merge-2 for you to fetch changes up to cbbf4c0be8a725f08153949f45a85b2adafbbbd3: iomap: move iomap_read_inline_data around (2019-05-01 20:16:40 -0700) Changes for Linux 5.2: - Add some extra hooks to the iomap buffered write path to enable gfs2 journalled writes. - SPDX conversion - Various refactoring. Andreas Gruenbacher (3): fs: Turn __generic_write_end into a void function iomap: Fix use-after-free error in page_done callback iomap: Add a page_prepare callback Christoph Hellwig (3): iomap: convert to SPDX identifier iomap: Clean up __generic_write_end calling iomap: move iomap_read_inline_data around fs/buffer.c | 8 ++-- fs/gfs2/bmap.c| 15 +--- fs/internal.h | 2 +- fs/iomap.c| 105 +++--- include/linux/iomap.h | 22 --- 5 files changed, 88 insertions(+), 64 deletions(-)
Re: [Cluster-devel] [GFS2 PATCH v3 04/19] gfs2: Warn when a journal replay overwrites a rgrp with buffers
Bob, On Wed, 1 May 2019 at 01:03, Bob Peterson wrote: > This patch adds some instrumentation in gfs2's journal replay that > indicates when we're about to overwrite a rgrp for which we already > have a valid buffer_head. looks okay, but can you explain in the commit message when this problem will trigger and why that's a problem? Thanks, Andreas > Signed-off-by: Bob Peterson > --- > fs/gfs2/lops.c | 22 -- > 1 file changed, 20 insertions(+), 2 deletions(-) > > diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c > index 6af6a3cea967..2e8c6d02e112 100644 > --- a/fs/gfs2/lops.c > +++ b/fs/gfs2/lops.c > @@ -564,9 +564,27 @@ static int buf_lo_scan_elements(struct gfs2_jdesc *jd, > u32 start, > > if (gfs2_meta_check(sdp, bh_ip)) > error = -EIO; > - else > + else { > + struct gfs2_meta_header *mh = > + (struct gfs2_meta_header *)bh_ip->b_data; > + > + if (mh->mh_type == cpu_to_be32(GFS2_METATYPE_RG)) { > + struct gfs2_rgrpd *rgd; > + > + rgd = gfs2_blk2rgrpd(sdp, blkno, false); > + if (rgd && rgd->rd_addr == blkno && > + rgd->rd_bits && rgd->rd_bits->bi_bh) { > + fs_info(sdp, "Replaying 0x%llx but we > " > + "already have a bh!\n", > + (unsigned long long)blkno); > + fs_info(sdp, "busy:%d, pinned:%d\n", > + > buffer_busy(rgd->rd_bits->bi_bh) ? 1 : 0, > + > buffer_pinned(rgd->rd_bits->bi_bh)); > + gfs2_dump_glock(NULL, rgd->rd_gl); > + } > + } > mark_buffer_dirty(bh_ip); > - > + } > brelse(bh_log); > brelse(bh_ip); > > -- > 2.20.1 >
[Cluster-devel] [PATCH 1/1] dlm_controld: bind to all interfaces for failover
Support for automatic failover in the face of network interruptions is being added to the DLM kernel component. [1] This patch aids in that effort by adding a mechanism whereby userspace can convey to the kernel its intention to use all network addresses for automatic failover. DLM's current default behavior is to bind to only a single interface. When --bind_all is set, dlm_controld will write to a configfs node that alerts the kernel of its intention to use all local network addresses for automatic failover. When selecting the next address to bind to, DLM will iterate through its list of local network addresses in a round-robin fashion. Support for other address selection heuritics may be added in the future. It is important to understand that, per the DLM spec, while DLM can use a set of addresses for automatic failover, only one address is considered the active address between two DLM nodes at a time. This patch does not violate that constraint. [1] https://www.redhat.com/archives/cluster-devel/2019-January/msg9.html Signed-off-by: David Windsor --- dlm_controld/action.c | 19 +++ dlm_controld/dlm.conf.5 | 2 ++ dlm_controld/dlm_daemon.h | 1 + dlm_controld/main.c | 5 + 4 files changed, 27 insertions(+) diff --git a/dlm_controld/action.c b/dlm_controld/action.c index 84637f15..ecd0d022 100644 --- a/dlm_controld/action.c +++ b/dlm_controld/action.c @@ -662,6 +662,25 @@ int add_configfs_node(int nodeid, char *addr, int addrlen, int local) return -1; } close(fd); + + if (opt(bind_all_ind)) { + memset(path, 0, PATH_MAX); + snprintf(path, PATH_MAX, "%s/%d/bind_all", COMMS_DIR, nodeid); + + fd = open(path, O_WRONLY); + if (fd < 0) { + log_error("%s: open failed: %d", path, errno); + return -1; + } + + rv = do_write(fd, (void *)"1", strlen("1")); + if (rv < 0) { + log_error("%s: write failed: %d", path, errno); + close(fd); + return -1; + } + close(fd); + } out: return 0; } diff --git a/dlm_controld/dlm.conf.5 b/dlm_controld/dlm.conf.5 index 616b60da..09492176 100644 --- a/dlm_controld/dlm.conf.5 +++ b/dlm_controld/dlm.conf.5 @@ -38,6 +38,8 @@ log_debug .br protocol .br +bind_all +.br debug_logfile .br enable_plock diff --git a/dlm_controld/dlm_daemon.h b/dlm_controld/dlm_daemon.h index 1182c971..3221e19c 100644 --- a/dlm_controld/dlm_daemon.h +++ b/dlm_controld/dlm_daemon.h @@ -95,6 +95,7 @@ enum { timewarn_ind, protocol_ind, debug_logfile_ind, + bind_all_ind, enable_fscontrol_ind, enable_plock_ind, plock_debug_ind, diff --git a/dlm_controld/main.c b/dlm_controld/main.c index 1b60ccda..8be6a4bc 100644 --- a/dlm_controld/main.c +++ b/dlm_controld/main.c @@ -1727,6 +1727,11 @@ static void set_opt_defaults(void) -1, "detect", "dlm kernel lowcomms protocol: tcp, sctp, detect"); + set_opt_default(bind_all_ind, + "bind_all", '\0', req_arg_int, + 0, NULL, + ""); /* do not advertise */ + set_opt_default(debug_logfile_ind, "debug_logfile", 'L', no_arg, 0, NULL, -- 2.20.1
[Cluster-devel] [GFS2 PATCH 05/12] gfs2: Replace gl_revokes with a GLF flag
From: Bob Peterson The gl_revokes value determines how many outstanding revokes a glock has on the superblock revokes list; this is used to avoid unnecessary log flushes. However, gl_revokes is only ever tested for being zero, and it's only decremented in revoke_lo_after_commit, which removes all revokes from the list, so we know that the gl_revoke values of all the glocks on the list will reach zero. Therefore, we can replace gl_revokes with a bit flag. This saves an atomic counter in struct gfs2_glock. Signed-off-by: Bob Peterson Signed-off-by: Andreas Gruenbacher --- fs/gfs2/glock.c | 4 ++-- fs/gfs2/incore.h | 2 +- fs/gfs2/log.c| 4 +++- fs/gfs2/lops.c | 33 - fs/gfs2/main.c | 1 - fs/gfs2/super.c | 2 +- 6 files changed, 31 insertions(+), 15 deletions(-) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 71c28ff98b56..15c605cfcfc8 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -140,7 +140,7 @@ void gfs2_glock_free(struct gfs2_glock *gl) { struct gfs2_sbd *sdp = gl->gl_name.ln_sbd; - BUG_ON(atomic_read(>gl_revokes)); + BUG_ON(test_bit(GLF_REVOKES, >gl_flags)); rhashtable_remove_fast(_hash_table, >gl_node, ht_parms); smp_mb(); wake_up_glock(gl); @@ -1801,7 +1801,7 @@ void gfs2_dump_glock(struct seq_file *seq, struct gfs2_glock *gl) state2str(gl->gl_target), state2str(gl->gl_demote_state), dtime, atomic_read(>gl_ail_count), - atomic_read(>gl_revokes), + test_bit(GLF_REVOKES, >gl_flags) ? 1 : 0, (int)gl->gl_lockref.count, gl->gl_hold_time); list_for_each_entry(gh, >gl_holders, gh_list) diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index 86840a70ee1a..6a94b094a904 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -345,6 +345,7 @@ enum { GLF_OBJECT = 14, /* Used only for tracing */ GLF_BLOCKING= 15, GLF_INODE_CREATING = 16, /* Inode creation occurring */ + GLF_REVOKES = 17, /* Glock has revokes in queue */ }; struct gfs2_glock { @@ -374,7 +375,6 @@ struct gfs2_glock { struct list_head gl_lru; struct list_head gl_ail_list; atomic_t gl_ail_count; - atomic_t gl_revokes; struct delayed_work gl_work; union { /* For inode and iopen glocks only */ diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index 7ba94b66c91b..d55315a46ece 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -606,8 +606,10 @@ void gfs2_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd) gfs2_remove_from_ail(bd); /* drops ref on bh */ bd->bd_bh = NULL; sdp->sd_log_num_revoke++; - if (atomic_inc_return(>gl_revokes) == 1) + if (!test_bit(GLF_REVOKES, >gl_flags)) { + set_bit(GLF_REVOKES, >gl_flags); gfs2_glock_hold(gl); + } set_bit(GLF_LFLUSH, >gl_flags); list_add(>bd_list, >sd_log_le_revoke); } diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c index b4f0a6a3ba59..2fd61853ba63 100644 --- a/fs/gfs2/lops.c +++ b/fs/gfs2/lops.c @@ -662,19 +662,34 @@ static void revoke_lo_before_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr) static void revoke_lo_after_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr) { struct list_head *head = >sd_log_le_revoke; - struct gfs2_bufdata *bd; - struct gfs2_glock *gl; + struct gfs2_bufdata *bd, *tmp; - while (!list_empty(head)) { - bd = list_entry(head->next, struct gfs2_bufdata, bd_list); - list_del_init(>bd_list); - gl = bd->bd_gl; - if (atomic_dec_return(>gl_revokes) == 0) { - clear_bit(GLF_LFLUSH, >gl_flags); - gfs2_glock_queue_put(gl); + /* +* Glocks can be referenced repeatedly on the revoke list, but the list +* only holds one reference. All glocks on the list will have the +* GLF_REVOKES flag set initially. +*/ + + list_for_each_entry_safe(bd, tmp, head, bd_list) { + struct gfs2_glock *gl = bd->bd_gl; + + if (test_bit(GLF_REVOKES, >gl_flags)) { + /* Keep each glock on the list exactly once. */ + clear_bit(GLF_REVOKES, >gl_flags); + continue; } + list_del(>bd_list); + kmem_cache_free(gfs2_bufdata_cachep, bd); + } + list_for_each_entry_safe(bd, tmp, head, bd_list) { + struct gfs2_glock *gl = bd->bd_gl; + + list_del(>bd_list); kmem_cache_free(gfs2_bufdata_cachep, bd); + clear_bit(GLF_LFLUSH, >gl_flags); + gfs2_glock_queue_put(gl); } + /* the list is empty now */ } static void revoke_lo_before_scan(struct gfs2_jdesc
[Cluster-devel] [GFS2 PATCH 04/12] gfs2: Fix occasional glock use-after-free
This patch has to do with the life cycle of glocks and buffers. When gfs2 metadata or journaled data is queued to be written, a gfs2_bufdata object is assigned to track the buffer, and that is queued to various lists, including the glock's gl_ail_list to indicate it's on the active items list. Once the page associated with the buffer has been written, it is removed from the ail list, but its life isn't over until a revoke has been successfully written. So after the block is written, its bufdata object is moved from the glock's gl_ail_list to a file-system-wide list of pending revokes, sd_log_le_revoke. At that point the glock still needs to track how many revokes it contributed to that list (in gl_revokes) so that things like glock go_sync can ensure all the metadata has been not only written, but also revoked before the glock is granted to a different node. This is to guarantee journal replay doesn't replay the block once the glock has been granted to another node. Ross Lagerwall recently discovered a race in which an inode could be evicted, and its glock freed after its ail list had been synced, but while it still had unwritten revokes on the sd_log_le_revoke list. The evict decremented the glock reference count to zero, which allowed the glock to be freed. After the revoke was written, function revoke_lo_after_commit tried to adjust the glock's gl_revokes counter and clear its GLF_LFLUSH flag, at which time it referenced the freed glock. This patch fixes the problem by incrementing the glock reference count in gfs2_add_revoke when the glock's first bufdata object is moved from the glock to the global revokes list. Later, when the glock's last such bufdata object is freed, the reference count is decremented. This guarantees that whichever process finishes last (the revoke writing or the evict) will properly free the glock, and neither will reference the glock after it has been freed. Reported-by: Ross Lagerwall Signed-off-by: Andreas Gruenbacher Signed-off-by: Bob Peterson --- fs/gfs2/glock.c | 1 + fs/gfs2/log.c | 3 ++- fs/gfs2/lops.c | 6 -- 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index e4f6d39500bc..71c28ff98b56 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -140,6 +140,7 @@ void gfs2_glock_free(struct gfs2_glock *gl) { struct gfs2_sbd *sdp = gl->gl_name.ln_sbd; + BUG_ON(atomic_read(>gl_revokes)); rhashtable_remove_fast(_hash_table, >gl_node, ht_parms); smp_mb(); wake_up_glock(gl); diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index ebbc68dca145..7ba94b66c91b 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -606,7 +606,8 @@ void gfs2_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd) gfs2_remove_from_ail(bd); /* drops ref on bh */ bd->bd_bh = NULL; sdp->sd_log_num_revoke++; - atomic_inc(>gl_revokes); + if (atomic_inc_return(>gl_revokes) == 1) + gfs2_glock_hold(gl); set_bit(GLF_LFLUSH, >gl_flags); list_add(>bd_list, >sd_log_le_revoke); } diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c index aef21b6a608f..b4f0a6a3ba59 100644 --- a/fs/gfs2/lops.c +++ b/fs/gfs2/lops.c @@ -669,8 +669,10 @@ static void revoke_lo_after_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr) bd = list_entry(head->next, struct gfs2_bufdata, bd_list); list_del_init(>bd_list); gl = bd->bd_gl; - atomic_dec(>gl_revokes); - clear_bit(GLF_LFLUSH, >gl_flags); + if (atomic_dec_return(>gl_revokes) == 0) { + clear_bit(GLF_LFLUSH, >gl_flags); + gfs2_glock_queue_put(gl); + } kmem_cache_free(gfs2_bufdata_cachep, bd); } } -- 2.20.1
[Cluster-devel] [GFS2 PATCH 02/12] gfs2: Fix lru_count going negative
From: Ross Lagerwall Under certain conditions, lru_count may drop below zero resulting in a large amount of log spam like this: vmscan: shrink_slab: gfs2_dump_glock+0x3b0/0x630 [gfs2] \ negative objects to delete nr=-1 This happens as follows: 1) A glock is moved from lru_list to the dispose list and lru_count is decremented. 2) The dispose function calls cond_resched() and drops the lru lock. 3) Another thread takes the lru lock and tries to add the same glock to lru_list, checking if the glock is on an lru list. 4) It is on a list (actually the dispose list) and so it avoids incrementing lru_count. 5) The glock is moved to lru_list. 5) The original thread doesn't dispose it because it has been re-added to the lru list but the lru_count has still decreased by one. Fix by checking if the LRU flag is set on the glock rather than checking if the glock is on some list and rearrange the code so that the LRU flag is added/removed precisely when the glock is added/removed from lru_list. Signed-off-by: Ross Lagerwall Signed-off-by: Andreas Gruenbacher --- fs/gfs2/glock.c | 22 +- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index d32964cd1117..e4f6d39500bc 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -183,15 +183,19 @@ static int demote_ok(const struct gfs2_glock *gl) void gfs2_glock_add_to_lru(struct gfs2_glock *gl) { + if (!(gl->gl_ops->go_flags & GLOF_LRU)) + return; + spin_lock(_lock); - if (!list_empty(>gl_lru)) - list_del_init(>gl_lru); - else + list_del(>gl_lru); + list_add_tail(>gl_lru, _list); + + if (!test_bit(GLF_LRU, >gl_flags)) { + set_bit(GLF_LRU, >gl_flags); atomic_inc(_count); + } - list_add_tail(>gl_lru, _list); - set_bit(GLF_LRU, >gl_flags); spin_unlock(_lock); } @@ -201,7 +205,7 @@ static void gfs2_glock_remove_from_lru(struct gfs2_glock *gl) return; spin_lock(_lock); - if (!list_empty(>gl_lru)) { + if (test_bit(GLF_LRU, >gl_flags)) { list_del_init(>gl_lru); atomic_dec(_count); clear_bit(GLF_LRU, >gl_flags); @@ -1159,8 +1163,7 @@ void gfs2_glock_dq(struct gfs2_holder *gh) !test_bit(GLF_DEMOTE, >gl_flags)) fast_path = 1; } - if (!test_bit(GLF_LFLUSH, >gl_flags) && demote_ok(gl) && - (glops->go_flags & GLOF_LRU)) + if (!test_bit(GLF_LFLUSH, >gl_flags) && demote_ok(gl)) gfs2_glock_add_to_lru(gl); trace_gfs2_glock_queue(gh, 0); @@ -1456,6 +1459,7 @@ __acquires(_lock) if (!spin_trylock(>gl_lockref.lock)) { add_back_to_lru: list_add(>gl_lru, _list); + set_bit(GLF_LRU, >gl_flags); atomic_inc(_count); continue; } @@ -1463,7 +1467,6 @@ __acquires(_lock) spin_unlock(>gl_lockref.lock); goto add_back_to_lru; } - clear_bit(GLF_LRU, >gl_flags); gl->gl_lockref.count++; if (demote_ok(gl)) handle_callback(gl, LM_ST_UNLOCKED, 0, false); @@ -1498,6 +1501,7 @@ static long gfs2_scan_glock_lru(int nr) if (!test_bit(GLF_LOCK, >gl_flags)) { list_move(>gl_lru, ); atomic_dec(_count); + clear_bit(GLF_LRU, >gl_flags); freed++; continue; } -- 2.20.1
[Cluster-devel] [GFS2 PATCH 07/12] gfs2: Remove unnecessary extern declarations
Make log operations statuc; they are only used locally. Signed-off-by: Andreas Gruenbacher --- fs/gfs2/lops.c | 6 +++--- fs/gfs2/lops.h | 5 - 2 files changed, 3 insertions(+), 8 deletions(-) diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c index 2fd61853ba63..6c1ec6c60639 100644 --- a/fs/gfs2/lops.c +++ b/fs/gfs2/lops.c @@ -870,7 +870,7 @@ static void databuf_lo_after_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr) } -const struct gfs2_log_operations gfs2_buf_lops = { +static const struct gfs2_log_operations gfs2_buf_lops = { .lo_before_commit = buf_lo_before_commit, .lo_after_commit = buf_lo_after_commit, .lo_before_scan = buf_lo_before_scan, @@ -879,7 +879,7 @@ const struct gfs2_log_operations gfs2_buf_lops = { .lo_name = "buf", }; -const struct gfs2_log_operations gfs2_revoke_lops = { +static const struct gfs2_log_operations gfs2_revoke_lops = { .lo_before_commit = revoke_lo_before_commit, .lo_after_commit = revoke_lo_after_commit, .lo_before_scan = revoke_lo_before_scan, @@ -888,7 +888,7 @@ const struct gfs2_log_operations gfs2_revoke_lops = { .lo_name = "revoke", }; -const struct gfs2_log_operations gfs2_databuf_lops = { +static const struct gfs2_log_operations gfs2_databuf_lops = { .lo_before_commit = databuf_lo_before_commit, .lo_after_commit = databuf_lo_after_commit, .lo_scan_elements = databuf_lo_scan_elements, diff --git a/fs/gfs2/lops.h b/fs/gfs2/lops.h index 4e81742de7a0..320fbf28d2fb 100644 --- a/fs/gfs2/lops.h +++ b/fs/gfs2/lops.h @@ -20,11 +20,6 @@ ((sizeof(struct gfs2_log_descriptor) + (2 * sizeof(__be64) - 1)) & \ ~(2 * sizeof(__be64) - 1)) -extern const struct gfs2_log_operations gfs2_glock_lops; -extern const struct gfs2_log_operations gfs2_buf_lops; -extern const struct gfs2_log_operations gfs2_revoke_lops; -extern const struct gfs2_log_operations gfs2_databuf_lops; - extern const struct gfs2_log_operations *gfs2_log_ops[]; extern u64 gfs2_log_bmap(struct gfs2_sbd *sdp); extern void gfs2_log_write(struct gfs2_sbd *sdp, struct page *page, -- 2.20.1
[Cluster-devel] [GFS2 PATCH 10/12] gfs2: fix race between gfs2_freeze_func and unmount
From: Abhi Das As part of the freeze operation, gfs2_freeze_func() is left blocking on a request to hold the sd_freeze_gl in SH. This glock is held in EX by the gfs2_freeze() code. A subsequent call to gfs2_unfreeze() releases the EXclusively held sd_freeze_gl, which allows gfs2_freeze_func() to acquire it in SH and resume its operation. gfs2_unfreeze(), however, doesn't wait for gfs2_freeze_func() to complete. If a umount is issued right after unfreeze, it could result in an inconsistent filesystem because some journal data (statfs update) isn't written out. Refer to commit 24972557b12c for a more detailed explanation of how freeze/unfreeze work. This patch causes gfs2_unfreeze() to wait for gfs2_freeze_func() to complete before returning to the user. Signed-off-by: Abhi Das Signed-off-by: Andreas Gruenbacher --- fs/gfs2/incore.h | 1 + fs/gfs2/super.c | 8 +--- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index 78c8e761b321..b15755068593 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -621,6 +621,7 @@ enum { SDF_SKIP_DLM_UNLOCK = 8, SDF_FORCE_AIL_FLUSH = 9, SDF_AIL1_IO_ERROR = 10, + SDF_FS_FROZEN = 11, }; enum gfs2_freeze_state { diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index a6a325b2a78b..ceec631efa49 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -973,8 +973,7 @@ void gfs2_freeze_func(struct work_struct *work) if (error) { printk(KERN_INFO "GFS2: couldn't get freeze lock : %d\n", error); gfs2_assert_withdraw(sdp, 0); - } - else { + } else { atomic_set(>sd_freeze_state, SFS_UNFROZEN); error = thaw_super(sb); if (error) { @@ -987,6 +986,8 @@ void gfs2_freeze_func(struct work_struct *work) gfs2_glock_dq_uninit(_gh); } deactivate_super(sb); + clear_bit_unlock(SDF_FS_FROZEN, >sd_flags); + wake_up_bit(>sd_flags, SDF_FS_FROZEN); return; } @@ -1029,6 +1030,7 @@ static int gfs2_freeze(struct super_block *sb) msleep(1000); } error = 0; + set_bit(SDF_FS_FROZEN, >sd_flags); out: mutex_unlock(>sd_freeze_mutex); return error; @@ -1053,7 +1055,7 @@ static int gfs2_unfreeze(struct super_block *sb) gfs2_glock_dq_uninit(>sd_freeze_gh); mutex_unlock(>sd_freeze_mutex); - return 0; + return wait_on_bit(>sd_flags, SDF_FS_FROZEN, TASK_INTERRUPTIBLE); } /** -- 2.20.1
[Cluster-devel] [GFS2 PATCH 11/12] gfs2: Fix iomap write page reclaim deadlock
Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing buffered writes by starting a transaction in iomap_begin, writing a range of pages, and ending that transaction in iomap_end. This approach suffers from two problems: (1) Any allocations necessary for the write are done in iomap_begin, so when the data aren't journaled, there is no need for keeping the transaction open until iomap_end. (2) Transactions keep the gfs2 log flush lock held. When iomap_file_buffered_write calls balance_dirty_pages, this can end up calling gfs2_write_inode, which will try to flush the log. This requires taking the log flush lock which is already held, resulting in a deadlock. Fix both of these issues by not keeping transactions open from iomap_begin to iomap_end. Instead, start a small transaction in page_prepare and end it in page_done when necessary. Reported-by: Edwin Török Fixes: 64bc06bb32ee ("gfs2: iomap buffered write support") Signed-off-by: Andreas Gruenbacher Signed-off-by: Bob Peterson --- fs/gfs2/aops.c | 14 +--- fs/gfs2/bmap.c | 88 +++--- 2 files changed, 58 insertions(+), 44 deletions(-) diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c index 05dd78f4b2b3..6210d4429d84 100644 --- a/fs/gfs2/aops.c +++ b/fs/gfs2/aops.c @@ -649,7 +649,7 @@ static int gfs2_readpages(struct file *file, struct address_space *mapping, */ void adjust_fs_space(struct inode *inode) { - struct gfs2_sbd *sdp = inode->i_sb->s_fs_info; + struct gfs2_sbd *sdp = GFS2_SB(inode); struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode); struct gfs2_inode *l_ip = GFS2_I(sdp->sd_sc_inode); struct gfs2_statfs_change_host *m_sc = >sd_statfs_master; @@ -657,10 +657,13 @@ void adjust_fs_space(struct inode *inode) struct buffer_head *m_bh, *l_bh; u64 fs_total, new_free; + if (gfs2_trans_begin(sdp, 2 * RES_STATFS, 0) != 0) + return; + /* Total up the file system space, according to the latest rindex. */ fs_total = gfs2_ri_total(sdp); if (gfs2_meta_inode_buffer(m_ip, _bh) != 0) - return; + goto out; spin_lock(>sd_statfs_spin); gfs2_statfs_change_in(m_sc, m_bh->b_data + @@ -675,11 +678,14 @@ void adjust_fs_space(struct inode *inode) gfs2_statfs_change(sdp, new_free, new_free, 0); if (gfs2_meta_inode_buffer(l_ip, _bh) != 0) - goto out; + goto out2; update_statfs(sdp, m_bh, l_bh); brelse(l_bh); -out: +out2: brelse(m_bh); +out: + sdp->sd_rindex_uptodate = 0; + gfs2_trans_end(sdp); } /** diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index aa014725f84a..27c82f4aaf32 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -991,17 +991,28 @@ static void gfs2_write_unlock(struct inode *inode) gfs2_glock_dq_uninit(>i_gh); } +static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos, + unsigned len, struct iomap *iomap) +{ + struct gfs2_sbd *sdp = GFS2_SB(inode); + + return gfs2_trans_begin(sdp, RES_DINODE + (len >> inode->i_blkbits), 0); +} + static void gfs2_iomap_page_done(struct inode *inode, loff_t pos, unsigned copied, struct page *page, struct iomap *iomap) { struct gfs2_inode *ip = GFS2_I(inode); + struct gfs2_sbd *sdp = GFS2_SB(inode); - if (page) + if (page && !gfs2_is_stuffed(ip)) gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied); + gfs2_trans_end(sdp); } static const struct iomap_page_ops gfs2_iomap_page_ops = { + .page_prepare = gfs2_iomap_page_prepare, .page_done = gfs2_iomap_page_done, }; @@ -1057,31 +1068,45 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos, if (alloc_required) rblocks += gfs2_rg_blocks(ip, data_blocks + ind_blocks); - ret = gfs2_trans_begin(sdp, rblocks, iomap->length >> inode->i_blkbits); - if (ret) - goto out_trans_fail; + if (unstuff || iomap->type == IOMAP_HOLE) { + struct gfs2_trans *tr; - if (unstuff) { - ret = gfs2_unstuff_dinode(ip, NULL); + ret = gfs2_trans_begin(sdp, rblocks, + iomap->length >> inode->i_blkbits); if (ret) - goto out_trans_end; - release_metapath(mp); - ret = gfs2_iomap_get(inode, iomap->offset, iomap->length, -flags, iomap, mp); - if (ret) - goto out_trans_end; - } + goto out_trans_fail; - if (iomap->type == IOMAP_HOLE) { - ret = gfs2_iomap_alloc(inode, iomap, flags, mp); - if (ret) { -
[Cluster-devel] [GFS2 PATCH 01/12] gfs2: Fix loop in gfs2_rbm_find (v2)
Fix the resource group wrap-around logic in gfs2_rbm_find that commit e579ed4f44 broke. The bug can lead to unnecessary repeated scanning of the same bitmaps; there is a risk that future changes will turn this into an endless loop. This is an updated version of commit 2d29f6b96d ("gfs2: Fix loop in gfs2_rbm_find") which ended up being reverted because it introduced a performance regression in iozone (see commit e74c98ca2d). Changes since v1: - Simplify the wrap-around logic. - Handle the case where each resource group only has a single bitmap block (small filesystem). - Update rd_extfail_pt whenever we scan the entire bitmap, even when we don't start the scan at the very beginning of the bitmap. Fixes: e579ed4f446e ("GFS2: Introduce rbm field bii") Signed-off-by: Andreas Gruenbacher --- fs/gfs2/rgrp.c | 54 +++--- 1 file changed, 25 insertions(+), 29 deletions(-) diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index 17a8d3b43990..52a4f340a867 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -1729,25 +1729,22 @@ static int gfs2_reservation_check_and_update(struct gfs2_rbm *rbm, static int gfs2_rbm_find(struct gfs2_rbm *rbm, u8 state, u32 *minext, const struct gfs2_inode *ip, bool nowrap) { + bool scan_from_start = rbm->bii == 0 && rbm->offset == 0; struct buffer_head *bh; - int initial_bii; - u32 initial_offset; - int first_bii = rbm->bii; - u32 first_offset = rbm->offset; + int last_bii; u32 offset; u8 *buffer; - int n = 0; - int iters = rbm->rgd->rd_length; + bool wrapped = false; int ret; struct gfs2_bitmap *bi; struct gfs2_extent maxext = { .rbm.rgd = rbm->rgd, }; - /* If we are not starting at the beginning of a bitmap, then we -* need to add one to the bitmap count to ensure that we search -* the starting bitmap twice. + /* +* Determine the last bitmap to search. If we're not starting at the +* beginning of a bitmap, we need to search that bitmap twice to scan +* the entire resource group. */ - if (rbm->offset != 0) - iters++; + last_bii = rbm->bii - (rbm->offset == 0); while(1) { bi = rbm_bi(rbm); @@ -1761,47 +1758,46 @@ static int gfs2_rbm_find(struct gfs2_rbm *rbm, u8 state, u32 *minext, WARN_ON(!buffer_uptodate(bh)); if (state != GFS2_BLKST_UNLINKED && bi->bi_clone) buffer = bi->bi_clone + bi->bi_offset; - initial_offset = rbm->offset; offset = gfs2_bitfit(buffer, bi->bi_bytes, rbm->offset, state); - if (offset == BFITNOENT) - goto bitmap_full; + if (offset == BFITNOENT) { + if (state == GFS2_BLKST_FREE && rbm->offset == 0) + set_bit(GBF_FULL, >bi_flags); + goto next_bitmap; + } rbm->offset = offset; if (ip == NULL) return 0; - initial_bii = rbm->bii; ret = gfs2_reservation_check_and_update(rbm, ip, minext ? *minext : 0, ); if (ret == 0) return 0; - if (ret > 0) { - n += (rbm->bii - initial_bii); + if (ret > 0) goto next_iter; - } if (ret == -E2BIG) { rbm->bii = 0; rbm->offset = 0; - n += (rbm->bii - initial_bii); goto res_covered_end_of_rgrp; } return ret; -bitmap_full: /* Mark bitmap as full and fall through */ - if ((state == GFS2_BLKST_FREE) && initial_offset == 0) - set_bit(GBF_FULL, >bi_flags); - next_bitmap: /* Find next bitmap in the rgrp */ rbm->offset = 0; rbm->bii++; if (rbm->bii == rbm->rgd->rd_length) rbm->bii = 0; res_covered_end_of_rgrp: - if ((rbm->bii == 0) && nowrap) - break; - n++; + if (rbm->bii == 0) { + if (wrapped) + break; + wrapped = true; + if (nowrap) + break; + } next_iter: - if (n >= iters) + /* Have we scanned the entire resource group? */ + if (wrapped && rbm->bii > last_bii) break; } @@ -1811,8 +1807,8 @@ static int gfs2_rbm_find(struct gfs2_rbm *rbm, u8 state, u32 *minext, /* If
[Cluster-devel] [GFS2 PATCH 06/12] gfs2: Remove misleading comments in gfs2_evict_inode
Signed-off-by: Andreas Gruenbacher --- fs/gfs2/super.c | 5 - 1 file changed, 5 deletions(-) diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index e20fa55715e2..a6a325b2a78b 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -1630,8 +1630,6 @@ static void gfs2_evict_inode(struct inode *inode) goto out_truncate; } - /* Case 1 starts here */ - if (S_ISDIR(inode->i_mode) && (ip->i_diskflags & GFS2_DIF_EXHASH)) { error = gfs2_dir_exhash_dealloc(ip); @@ -1670,7 +1668,6 @@ static void gfs2_evict_inode(struct inode *inode) write_inode_now(inode, 1); gfs2_ail_flush(ip->i_gl, 0); - /* Case 2 starts here */ error = gfs2_trans_begin(sdp, 0, sdp->sd_jdesc->jd_blocks); if (error) goto out_unlock; @@ -1680,7 +1677,6 @@ static void gfs2_evict_inode(struct inode *inode) gfs2_trans_end(sdp); out_unlock: - /* Error path for case 1 */ if (gfs2_rs_active(>i_res)) gfs2_rs_deltree(>i_res); @@ -1699,7 +1695,6 @@ static void gfs2_evict_inode(struct inode *inode) if (error && error != GLR_TRYFAILED && error != -EROFS) fs_warn(sdp, "gfs2_evict_inode: %d\n", error); out: - /* Case 3 starts here */ truncate_inode_pages_final(>i_data); gfs2_rsqa_delete(ip, NULL); gfs2_ordered_del_inode(ip); -- 2.20.1
[Cluster-devel] [GFS2 PATCH 03/12] gfs2: clean_journal improperly set sd_log_flush_head
From: Bob Peterson This patch fixes regressions in 588bff95c94efc05f9e1a0b19015c9408ed7c0ef. Due to that patch, function clean_journal was setting the value of sd_log_flush_head, but that's only valid if it is replaying the node's own journal. If it's replaying another node's journal, that's completely wrong and will lead to multiple problems. This patch tries to clean up the mess by passing the value of the logical journal block number into gfs2_write_log_header so the function can treat non-owned journals generically. For the local journal, the journal extent map is used for best performance. For other nodes from other journals, new function gfs2_lblk_to_dblk is called to figure it out using gfs2_iomap_get. This patch also tries to establish more consistency when passing journal block parameters by changing several unsigned int types to a consistent u32. Fixes: 588bff95c94e ("GFS2: Reduce code redundancy writing log headers") Signed-off-by: Bob Peterson Reviewed-by: Andreas Gruenbacher --- fs/gfs2/bmap.c | 26 ++ fs/gfs2/bmap.h | 1 + fs/gfs2/incore.h | 2 +- fs/gfs2/log.c | 24 fs/gfs2/log.h | 3 ++- fs/gfs2/lops.c | 6 +++--- fs/gfs2/lops.h | 2 +- fs/gfs2/recovery.c | 10 ++ fs/gfs2/recovery.h | 2 +- 9 files changed, 57 insertions(+), 19 deletions(-) diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index 02b2646d84b3..e95b33b65d89 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -925,6 +925,32 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length, goto out; } +/** + * gfs2_lblk_to_dblk - convert logical block to disk block + * @inode: the inode of the file we're mapping + * @lblock: the block relative to the start of the file + * @dblock: the returned dblock, if no error + * + * This function maps a single block from a file logical block (relative to + * the start of the file) to a file system absolute block using iomap. + * + * Returns: the absolute file system block, or an error + */ +int gfs2_lblk_to_dblk(struct inode *inode, u32 lblock, u64 *dblock) +{ + struct iomap iomap = { }; + struct metapath mp = { .mp_aheight = 1, }; + loff_t pos = (loff_t)lblock << inode->i_blkbits; + int ret; + + ret = gfs2_iomap_get(inode, pos, i_blocksize(inode), 0, , ); + release_metapath(); + if (ret == 0) + *dblock = iomap.addr >> inode->i_blkbits; + + return ret; +} + static int gfs2_write_lock(struct inode *inode) { struct gfs2_inode *ip = GFS2_I(inode); diff --git a/fs/gfs2/bmap.h b/fs/gfs2/bmap.h index 6b18fb323f0a..19a1fd772c61 100644 --- a/fs/gfs2/bmap.h +++ b/fs/gfs2/bmap.h @@ -64,5 +64,6 @@ extern int gfs2_write_alloc_required(struct gfs2_inode *ip, u64 offset, extern int gfs2_map_journal_extents(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd); extern void gfs2_free_journal_extents(struct gfs2_jdesc *jd); extern int __gfs2_punch_hole(struct file *file, loff_t offset, loff_t length); +extern int gfs2_lblk_to_dblk(struct inode *inode, u32 lblock, u64 *dblock); #endif /* __BMAP_DOT_H__ */ diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index cdf07b408f54..86840a70ee1a 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -535,7 +535,7 @@ struct gfs2_jdesc { unsigned long jd_flags; #define JDF_RECOVERY 1 unsigned int jd_jid; - unsigned int jd_blocks; + u32 jd_blocks; int jd_recover_error; /* Replay stuff */ diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index b8830fda51e8..ebbc68dca145 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -666,11 +666,12 @@ void gfs2_write_revokes(struct gfs2_sbd *sdp) } /** - * write_log_header - Write a journal log header buffer at sd_log_flush_head + * gfs2_write_log_header - Write a journal log header buffer at lblock * @sdp: The GFS2 superblock * @jd: journal descriptor of the journal to which we are writing * @seq: sequence number * @tail: tail of the log + * @lblock: value for lh_blkno (block number relative to start of journal) * @flags: log header flags GFS2_LOG_HEAD_* * @op_flags: flags to pass to the bio * @@ -678,7 +679,8 @@ void gfs2_write_revokes(struct gfs2_sbd *sdp) */ void gfs2_write_log_header(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd, - u64 seq, u32 tail, u32 flags, int op_flags) + u64 seq, u32 tail, u32 lblock, u32 flags, + int op_flags) { struct gfs2_log_header *lh; u32 hash, crc; @@ -686,7 +688,7 @@ void gfs2_write_log_header(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd, struct gfs2_statfs_change_host *l_sc = >sd_statfs_local; struct timespec64 tv; struct super_block *sb = sdp->sd_vfs; - u64 addr; + u64 dblock; lh = page_address(page); clear_page(lh); @@ -699,15 +701,21 @@ void gfs2_write_log_header(struct gfs2_sbd *sdp, struct gfs2_jdesc
[Cluster-devel] [GFS2 PATCH 08/12] gfs2: Rename sd_log_le_{revoke, ordered}
Rename sd_log_le_revoke to sd_log_revokes and sd_log_le_ordered to sd_log_ordered: not sure what le stands for here, but it doesn't add clarity, and if it stands for list entry, it's actually confusing as those are both list heads but not list entries. Signed-off-by: Andreas Gruenbacher --- fs/gfs2/incore.h | 4 ++-- fs/gfs2/log.c| 14 +++--- fs/gfs2/log.h| 2 +- fs/gfs2/lops.c | 4 ++-- fs/gfs2/ops_fstype.c | 4 ++-- fs/gfs2/trans.c | 2 +- 6 files changed, 15 insertions(+), 15 deletions(-) diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index 6a94b094a904..78c8e761b321 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -809,8 +809,8 @@ struct gfs2_sbd { atomic_t sd_log_pinned; unsigned int sd_log_num_revoke; - struct list_head sd_log_le_revoke; - struct list_head sd_log_le_ordered; + struct list_head sd_log_revokes; + struct list_head sd_log_ordered; spinlock_t sd_ordered_lock; atomic_t sd_log_thresh1; diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index d55315a46ece..a7febb4bd400 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -551,9 +551,9 @@ static void gfs2_ordered_write(struct gfs2_sbd *sdp) LIST_HEAD(written); spin_lock(>sd_ordered_lock); - list_sort(NULL, >sd_log_le_ordered, _cmp); - while (!list_empty(>sd_log_le_ordered)) { - ip = list_entry(sdp->sd_log_le_ordered.next, struct gfs2_inode, i_ordered); + list_sort(NULL, >sd_log_ordered, _cmp); + while (!list_empty(>sd_log_ordered)) { + ip = list_entry(sdp->sd_log_ordered.next, struct gfs2_inode, i_ordered); if (ip->i_inode.i_mapping->nrpages == 0) { test_and_clear_bit(GIF_ORDERED, >i_flags); list_del(>i_ordered); @@ -564,7 +564,7 @@ static void gfs2_ordered_write(struct gfs2_sbd *sdp) filemap_fdatawrite(ip->i_inode.i_mapping); spin_lock(>sd_ordered_lock); } - list_splice(, >sd_log_le_ordered); + list_splice(, >sd_log_ordered); spin_unlock(>sd_ordered_lock); } @@ -573,8 +573,8 @@ static void gfs2_ordered_wait(struct gfs2_sbd *sdp) struct gfs2_inode *ip; spin_lock(>sd_ordered_lock); - while (!list_empty(>sd_log_le_ordered)) { - ip = list_entry(sdp->sd_log_le_ordered.next, struct gfs2_inode, i_ordered); + while (!list_empty(>sd_log_ordered)) { + ip = list_entry(sdp->sd_log_ordered.next, struct gfs2_inode, i_ordered); list_del(>i_ordered); WARN_ON(!test_and_clear_bit(GIF_ORDERED, >i_flags)); if (ip->i_inode.i_mapping->nrpages == 0) @@ -611,7 +611,7 @@ void gfs2_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd) gfs2_glock_hold(gl); } set_bit(GLF_LFLUSH, >gl_flags); - list_add(>bd_list, >sd_log_le_revoke); + list_add(>bd_list, >sd_log_revokes); } void gfs2_write_revokes(struct gfs2_sbd *sdp) diff --git a/fs/gfs2/log.h b/fs/gfs2/log.h index 86d07d436cdf..7a34a3234266 100644 --- a/fs/gfs2/log.h +++ b/fs/gfs2/log.h @@ -59,7 +59,7 @@ static inline void gfs2_ordered_add_inode(struct gfs2_inode *ip) if (!test_bit(GIF_ORDERED, >i_flags)) { spin_lock(>sd_ordered_lock); if (!test_and_set_bit(GIF_ORDERED, >i_flags)) - list_add(>i_ordered, >sd_log_le_ordered); + list_add(>i_ordered, >sd_log_ordered); spin_unlock(>sd_ordered_lock); } } diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c index 6c1ec6c60639..6af6a3cea967 100644 --- a/fs/gfs2/lops.c +++ b/fs/gfs2/lops.c @@ -623,7 +623,7 @@ static void revoke_lo_before_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr) { struct gfs2_meta_header *mh; unsigned int offset; - struct list_head *head = >sd_log_le_revoke; + struct list_head *head = >sd_log_revokes; struct gfs2_bufdata *bd; struct page *page; unsigned int length; @@ -661,7 +661,7 @@ static void revoke_lo_before_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr) static void revoke_lo_after_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr) { - struct list_head *head = >sd_log_le_revoke; + struct list_head *head = >sd_log_revokes; struct gfs2_bufdata *bd, *tmp; /* diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index b041cb8ae383..abfaecde0e3d 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -117,8 +117,8 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb) spin_lock_init(>sd_log_lock); atomic_set(>sd_log_pinned, 0); - INIT_LIST_HEAD(>sd_log_le_revoke); - INIT_LIST_HEAD(>sd_log_le_ordered); + INIT_LIST_HEAD(>sd_log_revokes); + INIT_LIST_HEAD(>sd_log_ordered); spin_lock_init(>sd_ordered_lock);
[Cluster-devel] [GFS2 PATCH 12/12] gfs2: read journal in large chunks
From: Abhi Das Use bios to read in the journal into the address space of the journal inode (jd_inode), sequentially and in large chunks. This is faster for locating the journal head that the previous binary search approach. When performing recovery, we keep the journal in the address space until recovery is done, which further speeds up things. Signed-off-by: Abhi Das Signed-off-by: Andreas Gruenbacher --- fs/gfs2/glops.c | 3 +- fs/gfs2/log.c| 4 +- fs/gfs2/lops.c | 212 +-- fs/gfs2/lops.h | 4 +- fs/gfs2/ops_fstype.c | 3 +- fs/gfs2/recovery.c | 125 + fs/gfs2/recovery.h | 2 - fs/gfs2/super.c | 5 +- 8 files changed, 219 insertions(+), 139 deletions(-) diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c index 78510ab91835..24ada3ccc525 100644 --- a/fs/gfs2/glops.c +++ b/fs/gfs2/glops.c @@ -28,6 +28,7 @@ #include "util.h" #include "trans.h" #include "dir.h" +#include "lops.h" struct workqueue_struct *gfs2_freeze_wq; @@ -531,7 +532,7 @@ static int freeze_go_xmote_bh(struct gfs2_glock *gl, struct gfs2_holder *gh) if (test_bit(SDF_JOURNAL_LIVE, >sd_flags)) { j_gl->gl_ops->go_inval(j_gl, DIO_METADATA); - error = gfs2_find_jhead(sdp->sd_jdesc, ); + error = gfs2_find_jhead(sdp->sd_jdesc, , false); if (error) gfs2_consist(sdp); if (!(head.lh_flags & GFS2_LOG_HEAD_UNMOUNT)) diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index a7febb4bd400..a2e1df488df0 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -744,7 +744,7 @@ void gfs2_write_log_header(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd, lh->lh_crc = cpu_to_be32(crc); gfs2_log_write(sdp, page, sb->s_blocksize, 0, dblock); - gfs2_log_submit_bio(>sd_log_bio, REQ_OP_WRITE, op_flags); + gfs2_log_submit_bio(>sd_log_bio, REQ_OP_WRITE | op_flags); log_flush_wait(sdp); } @@ -821,7 +821,7 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags) gfs2_ordered_write(sdp); lops_before_commit(sdp, tr); - gfs2_log_submit_bio(>sd_log_bio, REQ_OP_WRITE, 0); + gfs2_log_submit_bio(>sd_log_bio, REQ_OP_WRITE); if (sdp->sd_log_head != sdp->sd_log_flush_head) { log_flush_wait(sdp); diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c index 6af6a3cea967..ce048a9e058d 100644 --- a/fs/gfs2/lops.c +++ b/fs/gfs2/lops.c @@ -17,7 +17,9 @@ #include #include #include +#include +#include "bmap.h" #include "dir.h" #include "gfs2.h" #include "incore.h" @@ -194,7 +196,6 @@ static void gfs2_end_log_write_bh(struct gfs2_sbd *sdp, /** * gfs2_end_log_write - end of i/o to the log * @bio: The bio - * @error: Status of i/o request * * Each bio_vec contains either data from the pagecache or data * relating to the log itself. Here we iterate over the bio_vec @@ -232,20 +233,19 @@ static void gfs2_end_log_write(struct bio *bio) /** * gfs2_log_submit_bio - Submit any pending log bio * @biop: Address of the bio pointer - * @op: REQ_OP - * @op_flags: req_flag_bits + * @opf: REQ_OP | op_flags * * Submit any pending part-built or full bio to the block device. If * there is no pending bio, then this is a no-op. */ -void gfs2_log_submit_bio(struct bio **biop, int op, int op_flags) +void gfs2_log_submit_bio(struct bio **biop, int opf) { struct bio *bio = *biop; if (bio) { struct gfs2_sbd *sdp = bio->bi_private; atomic_inc(>sd_log_in_flight); - bio_set_op_attrs(bio, op, op_flags); + bio->bi_opf = opf; submit_bio(bio); *biop = NULL; } @@ -306,7 +306,7 @@ static struct bio *gfs2_log_get_bio(struct gfs2_sbd *sdp, u64 blkno, nblk >>= sdp->sd_fsb2bb_shift; if (blkno == nblk && !flush) return bio; - gfs2_log_submit_bio(biop, op, 0); + gfs2_log_submit_bio(biop, op); } *biop = gfs2_log_alloc_bio(sdp, blkno, end_io); @@ -377,6 +377,206 @@ void gfs2_log_write_page(struct gfs2_sbd *sdp, struct page *page) gfs2_log_bmap(sdp)); } +/** + * gfs2_end_log_read - end I/O callback for reads from the log + * @bio: The bio + * + * Simply unlock the pages in the bio. The main thread will wait on them and + * process them in order as necessary. + */ + +static void gfs2_end_log_read(struct bio *bio) +{ + struct page *page; + struct bio_vec *bvec; + int i; + struct bvec_iter_all iter_all; + + bio_for_each_segment_all(bvec, bio, i, iter_all) { + page = bvec->bv_page; + if (bio->bi_status) { + int err = blk_status_to_errno(bio->bi_status); + + SetPageError(page); + mapping_set_error(page->mapping, err);
[Cluster-devel] [GFS2 PATCH 00/12] Pre-pull patch posting (merge window)
Hello, for this merge window, we've got the following patches: "gfs2: Fix loop in gfs2_rbm_find (v2)" A rework of a fix we ended up reverting in the 5.0 kernel because of an iozone performance regression. "gfs2: read journal in large chunks" and "gfs2: fix race between gfs2_freeze_func and unmount" An improved version of a commit we also ended up reverting in the 5.0 kernel because of a regression in xfstest generic/311. It turns out that the journal changes were mostly innocent and that unfreeze didn't wait for the freeze to complete, which caused the filesystem to be unmounted before it was actually idle. "gfs2: Fix occasional glock use-after-free" "gfs2: Fix iomap write page reclaim deadlock" "gfs2: Fix lru_count going negative" Fixes for various problems reported and partially fixed by Citrix engineers. Thank you very much. "gfs2: clean_journal improperly set sd_log_flush_head" Another fix from Bob. A few other minor cleanups. Regards, Andreas -- Abhi Das (2): gfs2: fix race between gfs2_freeze_func and unmount gfs2: read journal in large chunks Andreas Gruenbacher (7): gfs2: Fix loop in gfs2_rbm_find (v2) gfs2: Fix occasional glock use-after-free gfs2: Remove misleading comments in gfs2_evict_inode gfs2: Remove unnecessary extern declarations gfs2: Rename sd_log_le_{revoke,ordered} gfs2: Rename gfs2_trans_{add_unrevoke => remove_revoke} gfs2: Fix iomap write page reclaim deadlock Bob Peterson (2): gfs2: clean_journal improperly set sd_log_flush_head gfs2: Replace gl_revokes with a GLF flag Ross Lagerwall (1): gfs2: Fix lru_count going negative -- 2.20.1