[Cluster-devel] [PATCH 0/2] gfs2: improvements to recovery and withdraw process (v2)

2018-11-21 Thread Bob Peterson
Hi, This is a second draft of a two-patch set to fix some of the nasty journal recovery problems I've found lately. The original post from 08 November had horribly bad and inaccurate comments, and Steve Whitehouse and Andreas Gruenbacher pointed out. This version is hopefully better and more

[Cluster-devel] [PATCH 2/2] gfs2: initiate journal recovery as soon as a node withdraws

2018-11-21 Thread Bob Peterson
This patch uses the "live" glock and some new lvbs to signal when a node has withdrawn from a file system. Nodes who see this try to initiate journal recovery. Definitions: 1. The "withdrawer" is a node that has withdrawn from a gfs2 file system due to some inconsistency. 2. The "recoverer"

[Cluster-devel] [PATCH 1/2] gfs2: Ignore recovery attempts if gfs2 has io error or is withdrawn

2018-11-21 Thread Bob Peterson
This patch addresses various problems with gfs2/dlm recovery. For example, suppose a node with a bunch of gfs2 mounts suddenly reboots due to kernel panic, and dlm determines it should perform recovery. DLM does so from a pseudo-state machine calling various callbacks into lock_dlm to perform a

Re: [Cluster-devel] [PATCH V10 09/19] block: introduce bio_bvecs()

2018-11-21 Thread Ming Lei
On Tue, Nov 20, 2018 at 09:35:07PM -0800, Sagi Grimberg wrote: > > > > Wait, I see that the bvec is still a single array per bio. When you said > > > a table I thought you meant a 2-dimentional array... > > > > I mean a new 1-d table A has to be created for multiple bios in one rq, > > and build

[Cluster-devel] [PATCH v1 2/4] ext4: fix race between llseek SEEK_END and write

2018-11-21 Thread Eiichi Tsukata
Implement individual lock for SEEK_END for ext4 which directly calls generic_file_llseek_size(). Signed-off-by: Eiichi Tsukata --- fs/ext4/file.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 69d65d49837b..6479f3066043 100644 ---

[Cluster-devel] [PATCH v1 4/4] overlayfs: fix race between llseek SEEK_END and write

2018-11-21 Thread Eiichi Tsukata
Implement individual lock for SEEK_END for overlayfs which directly calls generic_file_llseek_size(). Signed-off-by: Eiichi Tsukata --- fs/overlayfs/file.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c

[Cluster-devel] [PATCH v1 1/4] vfs: fix race between llseek SEEK_END and write

2018-11-21 Thread Eiichi Tsukata
The commit ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek") removed almost all locks in llseek() including SEEK_END. It based on the idea that write() updates size atomically. But in fact, write() can be divided into two or more parts in generic_perform_write() when pos straddles

Re: [Cluster-devel] [PATCH v1 3/4] f2fs: fix race between llseek SEEK_END and write

2018-11-21 Thread Christoph Hellwig
On Wed, Nov 21, 2018 at 11:43:59AM +0900, Eiichi Tsukata wrote: > This patch itself seems to be just a cleanup but with the > commit b25bd1d9fd87 ("vfs: fix race between llseek SEEK_END and write") > it fixes race. Please move this patch to the beginning of the series and replace the commit log

[Cluster-devel] [PATCH v1 0/4] fs: fix race between llseek SEEK_END and write

2018-11-21 Thread Eiichi Tsukata
Some file systems (including ext4, xfs, ramfs ...) have the following problem as I've described in the commit message of the 1/4 patch. The commit ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek") removed almost all locks in llseek() including SEEK_END. It based on the idea

[Cluster-devel] [PATCH v1 3/4] f2fs: fix race between llseek SEEK_END and write

2018-11-21 Thread Eiichi Tsukata
This patch itself seems to be just a cleanup but with the commit b25bd1d9fd87 ("vfs: fix race between llseek SEEK_END and write") it fixes race. Signed-off-by: Eiichi Tsukata --- fs/f2fs/file.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/fs/f2fs/file.c

Re: [Cluster-devel] [PATCH V11 17/19] block: document usage of bio iterator helpers

2018-11-21 Thread Nikolay Borisov
On 21.11.18 г. 5:23 ч., Ming Lei wrote: > Now multi-page bvec is supported, some helpers may return page by > page, meantime some may return segment by segment, this patch > documents the usage. > > Signed-off-by: Ming Lei > --- > Documentation/block/biovecs.txt | 24

Re: [Cluster-devel] [PATCH V10 09/19] block: introduce bio_bvecs()

2018-11-21 Thread Christoph Hellwig
On Tue, Nov 20, 2018 at 09:35:07PM -0800, Sagi Grimberg wrote: >> Given it is over TCP, I guess it should be doable for you to preallocate one >> 256-bvec table in one page for each request, then sets the max segment size >> as >> (unsigned int)-1, and max segment number as 256, the preallocated

Re: [Cluster-devel] [PATCH V11 03/19] block: introduce bio_for_each_bvec()

2018-11-21 Thread Christoph Hellwig
> +#define bio_iter_mp_iovec(bio, iter) \ > + segment_iter_bvec((bio)->bi_io_vec, (iter)) Besides the mp naming we'd like to get rid off there also is just a single user of this macro, please just expand it there. > +#define segment_iter_bvec(bvec, iter)

Re: [Cluster-devel] [PATCH V11 14/19] block: handle non-cluster bio out of blk_bio_segment_split

2018-11-21 Thread Ming Lei
On Wed, Nov 21, 2018 at 03:33:55PM +0100, Christoph Hellwig wrote: > > + non-cluster.o > > Do we really need a new source file for these few functions? > > > default: > > + if (!blk_queue_cluster(q)) { > > + blk_queue_non_cluster_bio(q, bio); > >

Re: [Cluster-devel] [PATCH V11 14/19] block: handle non-cluster bio out of blk_bio_segment_split

2018-11-21 Thread Christoph Hellwig
On Wed, Nov 21, 2018 at 11:37:27PM +0800, Ming Lei wrote: > > + bio = bio_alloc_bioset(GFP_NOIO, bio_segments(*bio_orig), > > + _cluster_bio_set); > > bio_segments(*bio_orig) may be > 256, so bio_alloc_bioset() may fail. Nothing a little min with BIO_MAX_PAGES couldn't fix.

Re: [Cluster-devel] [PATCH V11 03/19] block: introduce bio_for_each_bvec()

2018-11-21 Thread Ming Lei
On Wed, Nov 21, 2018 at 02:32:44PM +0100, Christoph Hellwig wrote: > > +#define bio_iter_mp_iovec(bio, iter) \ > > + segment_iter_bvec((bio)->bi_io_vec, (iter)) > > Besides the mp naming we'd like to get rid off there also is just > a single user of this macro,

Re: [Cluster-devel] [PATCH V11 15/19] block: enable multipage bvecs

2018-11-21 Thread Ming Lei
On Wed, Nov 21, 2018 at 03:55:02PM +0100, Christoph Hellwig wrote: > On Wed, Nov 21, 2018 at 11:23:23AM +0800, Ming Lei wrote: > > if (bio->bi_vcnt > 0) { > > - struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1]; > > + struct bio_vec bv; > > + struct bio_vec *seg =

Re: [Cluster-devel] [PATCH V11 02/19] block: introduce multi-page bvec helpers

2018-11-21 Thread Christoph Hellwig
On Wed, Nov 21, 2018 at 11:06:11PM +0800, Ming Lei wrote: > bvec_iter_* is used for single-page bvec in current linus tree, and there are > lots of users now: > > [linux]$ git grep -n "bvec_iter_*" ./ | wc > 191 995 13242 > > If we have to switch it first, it can be a big change, just

Re: [Cluster-devel] [PATCH V11 03/19] block: introduce bio_for_each_bvec()

2018-11-21 Thread Christoph Hellwig
On Wed, Nov 21, 2018 at 11:31:36PM +0800, Ming Lei wrote: > > But while looking over this I wonder why we even need the max_seg_len > > here. The only thing __bvec_iter_advance does it to move bi_bvec_done > > and bi_idx forward, with corresponding decrements of bi_size. As far > > as I can tell

Re: [Cluster-devel] [PATCH V11 15/19] block: enable multipage bvecs

2018-11-21 Thread Christoph Hellwig
On Wed, Nov 21, 2018 at 11:48:13PM +0800, Ming Lei wrote: > I guess the correct check should be: > > end_addr = vec_addr + bv->bv_offset + bv->bv_len; > if (same_page && > (end_addr & PAGE_MASK) != (page_addr & PAGE_MASK)) >

Re: [Cluster-devel] [PATCH V11 02/19] block: introduce multi-page bvec helpers

2018-11-21 Thread Christoph Hellwig
On Wed, Nov 21, 2018 at 11:23:10AM +0800, Ming Lei wrote: > This patch introduces helpers of 'segment_iter_*' for multipage > bvec support. > > The introduced helpers treate one bvec as real multi-page segment, > which may include more than one pages. Unless I'm missing something these bvec vs

Re: [Cluster-devel] [PATCH V11 14/19] block: handle non-cluster bio out of blk_bio_segment_split

2018-11-21 Thread Christoph Hellwig
> + non-cluster.o Do we really need a new source file for these few functions? > default: > + if (!blk_queue_cluster(q)) { > + blk_queue_non_cluster_bio(q, bio); > + return; I'd name this

Re: [Cluster-devel] [PATCH V11 02/19] block: introduce multi-page bvec helpers

2018-11-21 Thread Ming Lei
On Wed, Nov 21, 2018 at 02:19:28PM +0100, Christoph Hellwig wrote: > On Wed, Nov 21, 2018 at 11:23:10AM +0800, Ming Lei wrote: > > This patch introduces helpers of 'segment_iter_*' for multipage > > bvec support. > > > > The introduced helpers treate one bvec as real multi-page segment, > > which

Re: [Cluster-devel] [PATCH V11 15/19] block: enable multipage bvecs

2018-11-21 Thread Christoph Hellwig
On Wed, Nov 21, 2018 at 11:23:23AM +0800, Ming Lei wrote: > if (bio->bi_vcnt > 0) { > - struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1]; > + struct bio_vec bv; > + struct bio_vec *seg = >bi_io_vec[bio->bi_vcnt - 1]; > > - if (page ==

Re: [Cluster-devel] [PATCH V11 10/19] block: loop: pass multi-page bvec to iov_iter

2018-11-21 Thread Christoph Hellwig
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index 1ad6eafc43f2..a281b6737b61 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -805,6 +805,10 @@ struct req_iterator { > __rq_for_each_bio(_iter.bio, _rq) \ >

Re: [Cluster-devel] [PATCH V11 11/19] bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()

2018-11-21 Thread Christoph Hellwig
On Wed, Nov 21, 2018 at 11:23:19AM +0800, Ming Lei wrote: > bch_bio_alloc_pages() is always called on one new bio, so it is safe > to access the bvec table directly. Given it is the only kind of this > case, open code the bvec table access since bio_for_each_segment_all() > will be changed to

Re: [Cluster-devel] [PATCH V11 12/19] block: allow bio_for_each_segment_all() to iterate over multi-page bvec

2018-11-21 Thread Christoph Hellwig
On Wed, Nov 21, 2018 at 11:23:20AM +0800, Ming Lei wrote: > This patch introduces one extra iterator variable to > bio_for_each_segment_all(), > then we can allow bio_for_each_segment_all() to iterate over multi-page bvec. > > Given it is just one mechannical & simple change on all >

Re: [Cluster-devel] [PATCH V11 17/19] block: document usage of bio iterator helpers

2018-11-21 Thread Christoph Hellwig
On Wed, Nov 21, 2018 at 09:45:25AM +0200, Nikolay Borisov wrote: > > + bio_for_each_segment_all() > > + bio_first_bvec_all() > > + bio_first_page_all() > > + bio_last_bvec_all() > > + > > +* The following helpers iterate over single-page bvecs. The passed 'struct > > +bio_vec' will contain

Re: [Cluster-devel] [PATCH V11 14/19] block: handle non-cluster bio out of blk_bio_segment_split

2018-11-21 Thread Christoph Hellwig
Actually.. I think we can kill this code entirely. If we look at what the clustering setting is really about it is to avoid ever merging a segement that spans a page boundary. And we should be able to do that with something like this before your series: --- >From

Re: [Cluster-devel] [PATCH v1 0/4] fs: fix race between llseek SEEK_END and write

2018-11-21 Thread Al Viro
On Thu, Nov 22, 2018 at 02:40:50PM +0900, Eiichi Tsukata wrote: > 2018年11月21日(水) 13:54 Al Viro : > > > > On Wed, Nov 21, 2018 at 11:43:56AM +0900, Eiichi Tsukata wrote: > > > Some file systems (including ext4, xfs, ramfs ...) have the following > > > problem as I've described in the commit message

Re: [Cluster-devel] [PATCH V11 02/19] block: introduce multi-page bvec helpers

2018-11-21 Thread Ming Lei
On Wed, Nov 21, 2018 at 05:08:11PM +0100, Christoph Hellwig wrote: > On Wed, Nov 21, 2018 at 11:06:11PM +0800, Ming Lei wrote: > > bvec_iter_* is used for single-page bvec in current linus tree, and there > > are > > lots of users now: > > > > [linux]$ git grep -n "bvec_iter_*" ./ | wc > >

Re: [Cluster-devel] [PATCH v1 3/4] f2fs: fix race between llseek SEEK_END and write

2018-11-21 Thread Eiichi Tsukata
2018年11月21日(水) 18:23 Christoph Hellwig : > > On Wed, Nov 21, 2018 at 11:43:59AM +0900, Eiichi Tsukata wrote: > > This patch itself seems to be just a cleanup but with the > > commit b25bd1d9fd87 ("vfs: fix race between llseek SEEK_END and write") > > it fixes race. > > Please move this patch to

Re: [Cluster-devel] [PATCH V11 03/19] block: introduce bio_for_each_bvec()

2018-11-21 Thread Ming Lei
On Wed, Nov 21, 2018 at 05:10:25PM +0100, Christoph Hellwig wrote: > On Wed, Nov 21, 2018 at 11:31:36PM +0800, Ming Lei wrote: > > > But while looking over this I wonder why we even need the max_seg_len > > > here. The only thing __bvec_iter_advance does it to move bi_bvec_done > > > and bi_idx

Re: [Cluster-devel] [PATCH v1 0/4] fs: fix race between llseek SEEK_END and write

2018-11-21 Thread Eiichi Tsukata
2018年11月21日(水) 13:54 Al Viro : > > On Wed, Nov 21, 2018 at 11:43:56AM +0900, Eiichi Tsukata wrote: > > Some file systems (including ext4, xfs, ramfs ...) have the following > > problem as I've described in the commit message of the 1/4 patch. > > > > The commit ef3d0fd27e90 ("vfs: do (nearly)

Re: [Cluster-devel] [PATCH V11 03/19] block: introduce bio_for_each_bvec()

2018-11-21 Thread Christoph Hellwig
On Wed, Nov 21, 2018 at 05:10:25PM +0100, Christoph Hellwig wrote: > No - I think we can always use the code without any segment in > bvec_iter_advance. Because bvec_iter_advance only operates on the > iteractor, the generation of an actual single-page or multi-page > bvec is left to the caller