Hi,
This is a second draft of a two-patch set to fix some of the nasty
journal recovery problems I've found lately.
The original post from 08 November had horribly bad and inaccurate
comments, and Steve Whitehouse and Andreas Gruenbacher pointed out.
This version is hopefully better and more
This patch uses the "live" glock and some new lvbs to signal when
a node has withdrawn from a file system. Nodes who see this try to
initiate journal recovery.
Definitions:
1. The "withdrawer" is a node that has withdrawn from a gfs2
file system due to some inconsistency.
2. The "recoverer"
This patch addresses various problems with gfs2/dlm recovery.
For example, suppose a node with a bunch of gfs2 mounts suddenly
reboots due to kernel panic, and dlm determines it should perform
recovery. DLM does so from a pseudo-state machine calling various
callbacks into lock_dlm to perform a
On Tue, Nov 20, 2018 at 09:35:07PM -0800, Sagi Grimberg wrote:
>
> > > Wait, I see that the bvec is still a single array per bio. When you said
> > > a table I thought you meant a 2-dimentional array...
> >
> > I mean a new 1-d table A has to be created for multiple bios in one rq,
> > and build
Implement individual lock for SEEK_END for ext4 which directly calls
generic_file_llseek_size().
Signed-off-by: Eiichi Tsukata
---
fs/ext4/file.c | 10 ++
1 file changed, 10 insertions(+)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 69d65d49837b..6479f3066043 100644
---
Implement individual lock for SEEK_END for overlayfs which directly calls
generic_file_llseek_size().
Signed-off-by: Eiichi Tsukata
---
fs/overlayfs/file.c | 23 ---
1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
The commit ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
removed almost all locks in llseek() including SEEK_END. It based on the
idea that write() updates size atomically. But in fact, write() can be
divided into two or more parts in generic_perform_write() when pos
straddles
On Wed, Nov 21, 2018 at 11:43:59AM +0900, Eiichi Tsukata wrote:
> This patch itself seems to be just a cleanup but with the
> commit b25bd1d9fd87 ("vfs: fix race between llseek SEEK_END and write")
> it fixes race.
Please move this patch to the beginning of the series and replace
the commit log
Some file systems (including ext4, xfs, ramfs ...) have the following
problem as I've described in the commit message of the 1/4 patch.
The commit ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
removed almost all locks in llseek() including SEEK_END. It based on the
idea
This patch itself seems to be just a cleanup but with the
commit b25bd1d9fd87 ("vfs: fix race between llseek SEEK_END and write")
it fixes race.
Signed-off-by: Eiichi Tsukata
---
fs/f2fs/file.c | 6 +-
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/fs/f2fs/file.c
On 21.11.18 г. 5:23 ч., Ming Lei wrote:
> Now multi-page bvec is supported, some helpers may return page by
> page, meantime some may return segment by segment, this patch
> documents the usage.
>
> Signed-off-by: Ming Lei
> ---
> Documentation/block/biovecs.txt | 24
On Tue, Nov 20, 2018 at 09:35:07PM -0800, Sagi Grimberg wrote:
>> Given it is over TCP, I guess it should be doable for you to preallocate one
>> 256-bvec table in one page for each request, then sets the max segment size
>> as
>> (unsigned int)-1, and max segment number as 256, the preallocated
> +#define bio_iter_mp_iovec(bio, iter) \
> + segment_iter_bvec((bio)->bi_io_vec, (iter))
Besides the mp naming we'd like to get rid off there also is just
a single user of this macro, please just expand it there.
> +#define segment_iter_bvec(bvec, iter)
On Wed, Nov 21, 2018 at 03:33:55PM +0100, Christoph Hellwig wrote:
> > + non-cluster.o
>
> Do we really need a new source file for these few functions?
>
> > default:
> > + if (!blk_queue_cluster(q)) {
> > + blk_queue_non_cluster_bio(q, bio);
> >
On Wed, Nov 21, 2018 at 11:37:27PM +0800, Ming Lei wrote:
> > + bio = bio_alloc_bioset(GFP_NOIO, bio_segments(*bio_orig),
> > + _cluster_bio_set);
>
> bio_segments(*bio_orig) may be > 256, so bio_alloc_bioset() may fail.
Nothing a little min with BIO_MAX_PAGES couldn't fix.
On Wed, Nov 21, 2018 at 02:32:44PM +0100, Christoph Hellwig wrote:
> > +#define bio_iter_mp_iovec(bio, iter) \
> > + segment_iter_bvec((bio)->bi_io_vec, (iter))
>
> Besides the mp naming we'd like to get rid off there also is just
> a single user of this macro,
On Wed, Nov 21, 2018 at 03:55:02PM +0100, Christoph Hellwig wrote:
> On Wed, Nov 21, 2018 at 11:23:23AM +0800, Ming Lei wrote:
> > if (bio->bi_vcnt > 0) {
> > - struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
> > + struct bio_vec bv;
> > + struct bio_vec *seg =
On Wed, Nov 21, 2018 at 11:06:11PM +0800, Ming Lei wrote:
> bvec_iter_* is used for single-page bvec in current linus tree, and there are
> lots of users now:
>
> [linux]$ git grep -n "bvec_iter_*" ./ | wc
> 191 995 13242
>
> If we have to switch it first, it can be a big change, just
On Wed, Nov 21, 2018 at 11:31:36PM +0800, Ming Lei wrote:
> > But while looking over this I wonder why we even need the max_seg_len
> > here. The only thing __bvec_iter_advance does it to move bi_bvec_done
> > and bi_idx forward, with corresponding decrements of bi_size. As far
> > as I can tell
On Wed, Nov 21, 2018 at 11:48:13PM +0800, Ming Lei wrote:
> I guess the correct check should be:
>
> end_addr = vec_addr + bv->bv_offset + bv->bv_len;
> if (same_page &&
> (end_addr & PAGE_MASK) != (page_addr & PAGE_MASK))
>
On Wed, Nov 21, 2018 at 11:23:10AM +0800, Ming Lei wrote:
> This patch introduces helpers of 'segment_iter_*' for multipage
> bvec support.
>
> The introduced helpers treate one bvec as real multi-page segment,
> which may include more than one pages.
Unless I'm missing something these bvec vs
On Wed, Nov 21, 2018 at 02:19:28PM +0100, Christoph Hellwig wrote:
> On Wed, Nov 21, 2018 at 11:23:10AM +0800, Ming Lei wrote:
> > This patch introduces helpers of 'segment_iter_*' for multipage
> > bvec support.
> >
> > The introduced helpers treate one bvec as real multi-page segment,
> > which
On Wed, Nov 21, 2018 at 11:23:23AM +0800, Ming Lei wrote:
> if (bio->bi_vcnt > 0) {
> - struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
> + struct bio_vec bv;
> + struct bio_vec *seg = >bi_io_vec[bio->bi_vcnt - 1];
>
> - if (page ==
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 1ad6eafc43f2..a281b6737b61 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -805,6 +805,10 @@ struct req_iterator {
> __rq_for_each_bio(_iter.bio, _rq) \
>
On Wed, Nov 21, 2018 at 11:23:19AM +0800, Ming Lei wrote:
> bch_bio_alloc_pages() is always called on one new bio, so it is safe
> to access the bvec table directly. Given it is the only kind of this
> case, open code the bvec table access since bio_for_each_segment_all()
> will be changed to
On Wed, Nov 21, 2018 at 11:23:20AM +0800, Ming Lei wrote:
> This patch introduces one extra iterator variable to
> bio_for_each_segment_all(),
> then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.
>
> Given it is just one mechannical & simple change on all
>
On Wed, Nov 21, 2018 at 09:45:25AM +0200, Nikolay Borisov wrote:
> > + bio_for_each_segment_all()
> > + bio_first_bvec_all()
> > + bio_first_page_all()
> > + bio_last_bvec_all()
> > +
> > +* The following helpers iterate over single-page bvecs. The passed 'struct
> > +bio_vec' will contain
Actually..
I think we can kill this code entirely. If we look at what the
clustering setting is really about it is to avoid ever merging a
segement that spans a page boundary. And we should be able to do
that with something like this before your series:
---
>From
On Thu, Nov 22, 2018 at 02:40:50PM +0900, Eiichi Tsukata wrote:
> 2018年11月21日(水) 13:54 Al Viro :
> >
> > On Wed, Nov 21, 2018 at 11:43:56AM +0900, Eiichi Tsukata wrote:
> > > Some file systems (including ext4, xfs, ramfs ...) have the following
> > > problem as I've described in the commit message
On Wed, Nov 21, 2018 at 05:08:11PM +0100, Christoph Hellwig wrote:
> On Wed, Nov 21, 2018 at 11:06:11PM +0800, Ming Lei wrote:
> > bvec_iter_* is used for single-page bvec in current linus tree, and there
> > are
> > lots of users now:
> >
> > [linux]$ git grep -n "bvec_iter_*" ./ | wc
> >
2018年11月21日(水) 18:23 Christoph Hellwig :
>
> On Wed, Nov 21, 2018 at 11:43:59AM +0900, Eiichi Tsukata wrote:
> > This patch itself seems to be just a cleanup but with the
> > commit b25bd1d9fd87 ("vfs: fix race between llseek SEEK_END and write")
> > it fixes race.
>
> Please move this patch to
On Wed, Nov 21, 2018 at 05:10:25PM +0100, Christoph Hellwig wrote:
> On Wed, Nov 21, 2018 at 11:31:36PM +0800, Ming Lei wrote:
> > > But while looking over this I wonder why we even need the max_seg_len
> > > here. The only thing __bvec_iter_advance does it to move bi_bvec_done
> > > and bi_idx
2018年11月21日(水) 13:54 Al Viro :
>
> On Wed, Nov 21, 2018 at 11:43:56AM +0900, Eiichi Tsukata wrote:
> > Some file systems (including ext4, xfs, ramfs ...) have the following
> > problem as I've described in the commit message of the 1/4 patch.
> >
> > The commit ef3d0fd27e90 ("vfs: do (nearly)
On Wed, Nov 21, 2018 at 05:10:25PM +0100, Christoph Hellwig wrote:
> No - I think we can always use the code without any segment in
> bvec_iter_advance. Because bvec_iter_advance only operates on the
> iteractor, the generation of an actual single-page or multi-page
> bvec is left to the caller
34 matches
Mail list logo