On Tue, Apr 30, 2019 at 12:09:34AM +0200, Andreas Gruenbacher wrote:
> Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is
> doing
> buffered writes by starting a transaction in iomap_begin, writing a range of
> pages, and ending that transaction in iomap_end. This approach
gfs2_unfreee() doesn't wait for gfs2_freeze_func() to complete. If a
umount is issued right after unfreeze, it could result in an
inconsistent filesystem because some journal data (statfs update)
wasn't written out.
This patch causes gfs2_unfreeze() to wait for gfs2_freeze_func() to
complete
On Tue, Apr 30, 2019 at 12:09:33AM +0200, Andreas Gruenbacher wrote:
> Move the page_done callback into a separate iomap_page_ops structure and
> add a page_prepare calback to be called before the next page is written
> to. In gfs2, we'll want to start a transaction in page_prepare and end
> it
On Tue, 30 Apr 2019 at 17:33, Darrick J. Wong wrote:
> On Tue, Apr 30, 2019 at 12:09:34AM +0200, Andreas Gruenbacher wrote:
> > Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is
> > doing
> > buffered writes by starting a transaction in iomap_begin, writing a range of
> >
On Tue, Apr 30, 2019 at 05:39:28PM +0200, Andreas Gruenbacher wrote:
> On Tue, 30 Apr 2019 at 17:33, Darrick J. Wong wrote:
> > On Tue, Apr 30, 2019 at 12:09:34AM +0200, Andreas Gruenbacher wrote:
> > > Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is
> > > doing
> > >
Am Di., 30. Apr. 2019 um 17:48 Uhr schrieb Darrick J. Wong
:
> Ok, I'll take the first four patches through the iomap branch and cc you
> on the pull request.
Ok great, thanks.
Andreas
On Tue, 30 Apr 2019 at 23:54, Abhi Das wrote:
> As part of the freeze operation, gfs2_freeze_func() is left blocking
> on a request to hold the sd_freeze_gl in SH. This glock is held in EX
> by the gfs2_freeze() code.
>
> A subsequent call to gfs2_unfreeze() releases the EXclusively held
>
As part of the freeze operation, gfs2_freeze_func() is left blocking
on a request to hold the sd_freeze_gl in SH. This glock is held in EX
by the gfs2_freeze() code.
A subsequent call to gfs2_unfreeze() releases the EXclusively held
sd_freeze_gl, which allows gfs2_freeze_func() to acquire it in
Hi,
On 01/05/2019 00:03, Bob Peterson wrote:
Here is version 3 of the patch set I posted on 23 April. It is revised
based on bugs I found testing with xfstests.
This is a collection of patches I've been using to address the myriad
of recovery problems I've found. I'm still finding them, so the
Hi,
On 01/05/2019 00:03, Bob Peterson wrote:
This patch addresses various problems with gfs2/dlm recovery.
For example, suppose a node with a bunch of gfs2 mounts suddenly
reboots due to kernel panic, and dlm determines it should perform
recovery. DLM does so from a pseudo-state machine
On Mon, Apr 29, 2019 at 07:50:28PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 30, 2019 at 12:09:29AM +0200, Andreas Gruenbacher wrote:
> > Here's another update of this patch queue, hopefully with all wrinkles
> > ironed out now.
> >
> > Darrick, I think Linus would be unhappy seeing the first
NACK. Andreas mentioned that the description could be more descriptive and
that we should be using clear_bit_unlock() instead of clear_bit(). I'll
post a v2 shortly with these changes.
Cheers!
--Abhi
On Tue, Apr 30, 2019 at 12:48 PM Abhi Das wrote:
> gfs2_unfreee() doesn't wait for
Function gfs2_freeze had a case statement that simply checked the
error code, but the break statements just made the logic hard to
read. This patch simplifies the logic in favor of a simple if.
Signed-off-by: Bob Peterson
---
fs/gfs2/super.c | 10 ++
1 file changed, 2 insertions(+), 8
This patch addresses various problems with gfs2/dlm recovery.
For example, suppose a node with a bunch of gfs2 mounts suddenly
reboots due to kernel panic, and dlm determines it should perform
recovery. DLM does so from a pseudo-state machine calling various
callbacks into lock_dlm to perform a
When a node withdraws from a file system, it often leaves its journal
in an incomplete state. This is especially true when the withdraw is
caused by io errors writing to the journal. Before this patch, a
withdraw would try to write a "shutdown" record to the journal, tell
dlm it's done with the
Before this patch, all io errors received by the quota daemon or the
logd daemon would cause a complaint message to be issued, such as:
gfs2: fsid=dm-13.0: Error 10 writing to journal, jid=0
This patch changes it so that the error message is only issued the
first time the error is
Before this patch, if gfs2_ail_empty_gl saw there was nothing on
the ail list, it would return and not flush the log. The problem
is that there could still be a revoke for the rgrp sitting on the
sd_log_le_revoke list that's been recently taken off the ail list.
But that revoke still needs to be
Before this patch, when a file system was withdrawn, all further
attempts to enqueue or promote glocks were rejected and returned
-EIO. This is only important for media-backed glocks like inode
and rgrp glocks. All other glocks may be safely used because there
is no potential for metadata
File system withdraws can be delayed when inconsistencies are
discovered when we cannot withdraw immediately, for example, when
critical spin_locks are held. But delaying the withdraw can cause
gfs2 to ignore the error and keep running for a short period of time.
For example, an rgrp glock may be
Before this patch, gfs2 kept track of journal io errors in two
places sd_log_error and the SDF_AIL1_IO_ERROR flag in sd_flags.
This patch consolidates the two into sd_log_error so that it
reflects the first error encountered writing to the journal.
In future patches, we will take advantage of this
Before this patch, function gfs2_freeze would loop forever if the
file system trying to be frozen is withdrawn. That's because function
gfs2_lock_fs_check_clean tries to enqueue the glock of the journal
and the gfs2_glock returns -EIO because you can't enqueue a journaled
glock after a withdraw.
Before this patch, function check_journal_clean would give messages
related to journal recovery. That's fine for mount time, but when a
node withdraws and forces replay that way, we don't want all those
distracting and misleading messages. This patch adds a new parameter
to make those messages
Before this patch, function do_xmote just assumed all the writes
submitted to the journal were finished and successful, and it
called the go_unlock function to release the dlm lock. But if
they're not, and a revoke failed to make its way to the journal,
a journal replay on another node will cause
Before this patch, gfs2 saved the pointers to the two daemon threads
(logd and quotad) in the superblock, but they were never cleared,
even if the threads were stopped (e.g. on remount -o ro). That meant
that certain error conditions (like a withdrawn file system) could
race. For example, xfstests
Here is version 3 of the patch set I posted on 23 April. It is revised
based on bugs I found testing with xfstests.
This is a collection of patches I've been using to address the myriad
of recovery problems I've found. I'm still finding them, so the battle
is not won yet. I'm not convinced we
Before this patch, function gfs2_log_flush could get into an infinite
loop trying to clear out its ail1 list. If the file system was
withdrawn (or pending withdraw) due to a problem with writing the ail1
list, it would never clear out the list, and therefore, would loop
infinitely. This patch
When a journal is replayed, gfs2 logs a message similar to:
jid=X: Replaying journal...
This patch adds the tail and block number so that the range of the
replayed block is also printed. These values will match the values
shown if the journal is dumped with gfs2_edit -p journalX. The
resulting
For its journal processing, gfs2 kept track of the number of buffers
added and removed on a per-transaction basis. These values are used
to calculate space needed in the journal. But while these calculations
make sense for the number of buffers, they make no sense for revokes.
Revokes are managed
This patch adds some instrumentation in gfs2's journal replay that
indicates when we're about to overwrite a rgrp for which we already
have a valid buffer_head.
Signed-off-by: Bob Peterson
---
fs/gfs2/lops.c | 22 --
1 file changed, 20 insertions(+), 2 deletions(-)
diff
Before this patch, if a process encountered an error and decided to
withdraw, if another process was already in the process of withdrawing,
the secondary withdraw would be silently ignored, which set it free
to proceed with its processing, unlock any locks, etc. That's correct
behavior if the
On Tue, Apr 30, 2019 at 12:09:30AM +0200, Andreas Gruenbacher wrote:
> From: Christoph Hellwig
>
> Move the call to __generic_write_end into iomap_write_end instead of
> duplicating it in each of the three branches. This requires open coding
> the generic_write_end for the buffer_head case.
>
On Tue, Apr 30, 2019 at 12:09:31AM +0200, Andreas Gruenbacher wrote:
> The VFS-internal __generic_write_end helper always returns the value of
> its @copied argument. This can be confusing, and it isn't very useful
> anyway, so turn __generic_write_end into a function returning void
> instead.
On Tue, Apr 30, 2019 at 12:09:32AM +0200, Andreas Gruenbacher wrote:
> In iomap_write_end, we're not holding a page reference anymore when
> calling the page_done callback, but the callback needs that reference to
> access the page. To fix that, move the put_page call in
> __generic_write_end
On Tue, Apr 30, 2019 at 12:09:31AM +0200, Andreas Gruenbacher wrote:
> The VFS-internal __generic_write_end helper always returns the value of
> its @copied argument. This can be confusing, and it isn't very useful
> anyway, so turn __generic_write_end into a function returning void
> instead.
>
34 matches
Mail list logo