Re: [Cluster-devel] [GFS2 PATCH] GFS2: Add to tail, not head, of transaction
Hi, On 30/05/18 20:24, Bob Peterson wrote: Hi, Before this patch, frunction gfs2_trans_add_meta called list_add to add a buffer to a transaction, tr_buf. Later, in the before_commit functions, it traversed the list in sequential order, which meant that they were processed in a sub-optimal order. For example, blocks could go out in 54321 order rather than 12345, causing media heads to bounce unnecessarily. This makes no difference for small IO operations, but large writes benefit greatly when they add lots of indirect blocks to a transaction, and those blocks are allocated in ascending order, as the block allocator tries to do. This patch changes it to list_add_tail so they are traversed (and therefore written back) in the same order as they are added to the transaction. In one of the more extreme examples, I did a test where I had 10 simultaneous instances of iozone, each writing 25GB of data, using one of our performance testing machines. The results are: Without the patch With the patch - -- Children see throughput 1395068.00 kB/sec 1527073.61 kB/sec Parent sees throughput 1348544.00 kB/sec 1485594.66 kB/sec These numbers are artificially inflated because I was also running with Andreas Gruenbacher's iomap-write patch set plus my rgrp sharing patch set in both these cases. Still, it shows a 9 percent performance boost for both children and parent throughput. Signed-off-by: Bob Peterson That makes sense assuming that a transaction will always add the blocks in ascending order. Writing into the journal will not be affected by this, since that will always be sequential anyway, so I assume that you are referring to AIL writeback here? It might make more sense just to sort the blocks into order at journal flush time, since then we'll always be in order, no matter what order the blocks are added during the transactions, Steve. --- fs/gfs2/trans.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/gfs2/trans.c b/fs/gfs2/trans.c index c75cacaa349b..c4c00b90935c 100644 --- a/fs/gfs2/trans.c +++ b/fs/gfs2/trans.c @@ -247,7 +247,7 @@ void gfs2_trans_add_meta(struct gfs2_glock *gl, struct buffer_head *bh) gfs2_pin(sdp, bd->bd_bh); mh->__pad0 = cpu_to_be64(0); mh->mh_jid = cpu_to_be32(sdp->sd_jdesc->jd_jid); - list_add(&bd->bd_list, &tr->tr_buf); + list_add_tail(&bd->bd_list, &tr->tr_buf); tr->tr_num_buf_new++; out_unlock: gfs2_log_unlock(sdp);
[Cluster-devel] [GFS2 PATCH] GFS2: Add to tail, not head, of transaction
Hi, Before this patch, frunction gfs2_trans_add_meta called list_add to add a buffer to a transaction, tr_buf. Later, in the before_commit functions, it traversed the list in sequential order, which meant that they were processed in a sub-optimal order. For example, blocks could go out in 54321 order rather than 12345, causing media heads to bounce unnecessarily. This makes no difference for small IO operations, but large writes benefit greatly when they add lots of indirect blocks to a transaction, and those blocks are allocated in ascending order, as the block allocator tries to do. This patch changes it to list_add_tail so they are traversed (and therefore written back) in the same order as they are added to the transaction. In one of the more extreme examples, I did a test where I had 10 simultaneous instances of iozone, each writing 25GB of data, using one of our performance testing machines. The results are: Without the patch With the patch - -- Children see throughput 1395068.00 kB/sec 1527073.61 kB/sec Parent sees throughput 1348544.00 kB/sec 1485594.66 kB/sec These numbers are artificially inflated because I was also running with Andreas Gruenbacher's iomap-write patch set plus my rgrp sharing patch set in both these cases. Still, it shows a 9 percent performance boost for both children and parent throughput. Signed-off-by: Bob Peterson --- fs/gfs2/trans.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/gfs2/trans.c b/fs/gfs2/trans.c index c75cacaa349b..c4c00b90935c 100644 --- a/fs/gfs2/trans.c +++ b/fs/gfs2/trans.c @@ -247,7 +247,7 @@ void gfs2_trans_add_meta(struct gfs2_glock *gl, struct buffer_head *bh) gfs2_pin(sdp, bd->bd_bh); mh->__pad0 = cpu_to_be64(0); mh->mh_jid = cpu_to_be32(sdp->sd_jdesc->jd_jid); - list_add(&bd->bd_list, &tr->tr_buf); + list_add_tail(&bd->bd_list, &tr->tr_buf); tr->tr_num_buf_new++; out_unlock: gfs2_log_unlock(sdp);
Re: [Cluster-devel] [GFS2 PATCH] GFS2: Add to tail, not head, of transaction
On 9 May 2018 at 15:21, Bob Peterson wrote: > Hi, > > Before this patch, frunction gfs2_trans_add_meta called list_add > to add a buffer to a transaction, tr_buf. Later, in the before_commit > functions, it traversed the list in sequential order, which meant > that they were processed in a sub-optimal order. For example, blocks > could go out in 54321 order rather than 12345, causing media heads > to bounce unnecessarily. Hmm, the patch itself looks reasonable, but given that gfs2_before_commit sorts the list by block number before iterating it, the above explanation seems to be at least partially wrong. Could you please clarify what's really going on? > This makes no difference for small IO operations, but large writes > benefit greatly when they add lots of indirect blocks to a > transaction, and those blocks are allocated in ascending order, > as the block allocator tries to do. > > This patch changes it to list_add_tail so they are traversed (and > therefore written back) in the same order as they are added to > the transaction. > > In one of the more extreme examples, I did a test where I had > 10 simultaneous instances of iozone, each writing 25GB of data, > using one of our performance testing machines. The results are: > > Without the patch With the patch > - -- > Children see throughput 1395068.00 kB/sec 1527073.61 kB/sec > Parent sees throughput 1348544.00 kB/sec 1485594.66 kB/sec > > These numbers are artificially inflated because I was also running > with Andreas Gruenbacher's iomap-write patch set plus my rgrp > sharing patch set in both these cases. Still, it shows a 9 percent > performance boost for both children and parent throughput. > > Signed-off-by: Bob Peterson > --- > fs/gfs2/trans.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/gfs2/trans.c b/fs/gfs2/trans.c > index c75cacaa349b..c4c00b90935c 100644 > --- a/fs/gfs2/trans.c > +++ b/fs/gfs2/trans.c > @@ -247,7 +247,7 @@ void gfs2_trans_add_meta(struct gfs2_glock *gl, struct > buffer_head *bh) > gfs2_pin(sdp, bd->bd_bh); > mh->__pad0 = cpu_to_be64(0); > mh->mh_jid = cpu_to_be32(sdp->sd_jdesc->jd_jid); > - list_add(&bd->bd_list, &tr->tr_buf); > + list_add_tail(&bd->bd_list, &tr->tr_buf); > tr->tr_num_buf_new++; > out_unlock: > gfs2_log_unlock(sdp); > Thanks, Andreas
[Cluster-devel] [GFS2 PATCH] GFS2: Add to tail, not head, of transaction
Hi, Before this patch, frunction gfs2_trans_add_meta called list_add to add a buffer to a transaction, tr_buf. Later, in the before_commit functions, it traversed the list in sequential order, which meant that they were processed in a sub-optimal order. For example, blocks could go out in 54321 order rather than 12345, causing media heads to bounce unnecessarily. This makes no difference for small IO operations, but large writes benefit greatly when they add lots of indirect blocks to a transaction, and those blocks are allocated in ascending order, as the block allocator tries to do. This patch changes it to list_add_tail so they are traversed (and therefore written back) in the same order as they are added to the transaction. In one of the more extreme examples, I did a test where I had 10 simultaneous instances of iozone, each writing 25GB of data, using one of our performance testing machines. The results are: Without the patch With the patch - -- Children see throughput 1395068.00 kB/sec 1527073.61 kB/sec Parent sees throughput 1348544.00 kB/sec 1485594.66 kB/sec These numbers are artificially inflated because I was also running with Andreas Gruenbacher's iomap-write patch set plus my rgrp sharing patch set in both these cases. Still, it shows a 9 percent performance boost for both children and parent throughput. Signed-off-by: Bob Peterson --- fs/gfs2/trans.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/gfs2/trans.c b/fs/gfs2/trans.c index c75cacaa349b..c4c00b90935c 100644 --- a/fs/gfs2/trans.c +++ b/fs/gfs2/trans.c @@ -247,7 +247,7 @@ void gfs2_trans_add_meta(struct gfs2_glock *gl, struct buffer_head *bh) gfs2_pin(sdp, bd->bd_bh); mh->__pad0 = cpu_to_be64(0); mh->mh_jid = cpu_to_be32(sdp->sd_jdesc->jd_jid); - list_add(&bd->bd_list, &tr->tr_buf); + list_add_tail(&bd->bd_list, &tr->tr_buf); tr->tr_num_buf_new++; out_unlock: gfs2_log_unlock(sdp);