Re: [Cluster-devel] [GFS2 PATCH] GFS2: Don't brelse rgrp buffer_heads every allocation
- Original Message - I'll see if I can track down why the rgrplvb option isn't performing as well. I suspect the matter goes back to my first comment above. Namely, that the slowdown goes back to the slowness of page cache lookup for the buffers of the rgrps we are using (not rejected ones). Hi, I did verify that the vast majority of time is spent doing page cache lookups, in function gfs2_getbuf(). The time spent in gfs2_meta_read() and gfs2_meta_wait() are minimal--almost nil--presumably because the page is already in cache. The rgrplvb option isn't improving this because it still calls gfs2_rgrp_bh_get which still does all the page cache lookups for the rgrp we need. The rgrplvb option should, in theory, save us a lot of time when searching for a suitable rgrp, especially when there are multiple nodes doing allocations. I can't think of any reason why my patch would be incompatible with rgrplvb, but I'd like to ask Ben Marzinski to scrutinize my patch carefully and see if he can find any flaws in the design. Regards, Bob Peterson Red Hat File Systems
[Cluster-devel] [GFS2 PATCH] GFS2: Don't add all glocks to the lru
Hi, Regarding my previously posted patch: I decided it makes more sense not to single out rgrp glocks for exclusion from the lru list. It makes sense to also exclude the transaction glock, and non-disk glocks off the lru list as well. Therefore, I changed this to a generic glops flag so we could specify them on an individual basis. Patch description: The glocks used for resource groups often come and go hundreds of thousands of time per second. Adding them to the lru list just adds unnecessary contention for the lru_lock spin_lock, especially considering we're almost certainly going to re-use the glock and take it back off the lru microseconds later. We never want the glock shrinker to cull them anyway. This patch adds a new bit in the glops that determines which glock types get put onto the lru list and which ones don't. Regards, Bob Peterson Signed-off-by: Bob Peterson rpete...@redhat.com --- diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 0fa8062..a38e38f 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -1076,7 +1076,8 @@ void gfs2_glock_dq(struct gfs2_holder *gh) !test_bit(GLF_DEMOTE, gl-gl_flags)) fast_path = 1; } - if (!test_bit(GLF_LFLUSH, gl-gl_flags) demote_ok(gl)) + if (!test_bit(GLF_LFLUSH, gl-gl_flags) demote_ok(gl) + (glops-go_flags GLOF_LRU)) gfs2_glock_add_to_lru(gl); trace_gfs2_glock_queue(gh, 0); diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c index fe91951..1249b2b 100644 --- a/fs/gfs2/glops.c +++ b/fs/gfs2/glops.c @@ -561,7 +561,7 @@ const struct gfs2_glock_operations gfs2_inode_glops = { .go_lock = inode_go_lock, .go_dump = inode_go_dump, .go_type = LM_TYPE_INODE, - .go_flags = GLOF_ASPACE, + .go_flags = GLOF_ASPACE | GLOF_LRU, }; const struct gfs2_glock_operations gfs2_rgrp_glops = { @@ -584,10 +584,12 @@ const struct gfs2_glock_operations gfs2_freeze_glops = { const struct gfs2_glock_operations gfs2_iopen_glops = { .go_type = LM_TYPE_IOPEN, .go_callback = iopen_go_callback, + .go_flags = GLOF_LRU, }; const struct gfs2_glock_operations gfs2_flock_glops = { .go_type = LM_TYPE_FLOCK, + .go_flags = GLOF_LRU, }; const struct gfs2_glock_operations gfs2_nondisk_glops = { @@ -596,7 +598,7 @@ const struct gfs2_glock_operations gfs2_nondisk_glops = { const struct gfs2_glock_operations gfs2_quota_glops = { .go_type = LM_TYPE_QUOTA, - .go_flags = GLOF_LVB, + .go_flags = GLOF_LVB | GLOF_LRU, }; const struct gfs2_glock_operations gfs2_journal_glops = { diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index 304a223..a1ec7c2 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -225,6 +225,7 @@ struct gfs2_glock_operations { const unsigned long go_flags; #define GLOF_ASPACE 1 #define GLOF_LVB2 +#define GLOF_LRU4 }; enum {
Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check
Hi David, David Teigland wrote: On Thu, Jun 11, 2015 at 05:47:28PM +0800, Guoqing Jiang wrote: Do you consider take the following clean up? If yes, I will send a formal patch, otherwise pls ignore it. On first glance, the old and new code do not appear to do the same thing, so let's leave it as it is. - to_nodeid = dlm_dir_nodeid(r); Sorry, seems it is the only different thing, if combines previous change with below modification, then the behavior is same. @@ -3644,7 +3644,10 @@ static int send_common(struct dlm_rsb *r, struct dlm_lkb *lkb, int mstype) struct dlm_mhandle *mh; int to_nodeid, error; - to_nodeid = r-res_nodeid; + if (mstype == DLM_MSG_LOOKUP) + to_nodeid = dlm_dir_nodeid(r); + else + to_nodeid = r-res_nodeid; And for create_message, the second parameter (lkb) is not effective to create three type msgs (REQUEST/LOOKUP/REMOVE). Thanks, Guoqing
Re: [Cluster-devel] [GFS2 PATCH] GFS2: Don't brelse rgrp buffer_heads every allocation
Hi, On 15/06/15 15:43, Bob Peterson wrote: - Original Message - I'm assuming that these figures are bandwidth rather than times, since that appears to show that the patch makes quite a large difference. However the reclen is rather small. In the 32 bytes case, thats 128 writes for each new block thats being allocated, unless of course that is 32k? Steve. Hi, To do this test, I'm executing this command: numactl --cpunodebind=0 --membind=0 /home/bob/iozone/iozone3_429/src/current/iozone -az -f /mnt/gfs2/iozone-gfs2 -n 2048m -g 2048m -y 32k -q 1m -e -i 0 -+n /home/bob/iozone.out According to iozone -h, specifying -y this way is 32K, not 32 bytes. The -q is maximum write size, in KB, so -q 1m is 1MB writes. The -g is maximum file size, in KB, so -g 2048 is a 2MB file. So the test varies the writes from 32K to 1MB, adjusting the number of writes to get the file to 2MB. Regards, Bob Peterson Red Hat File Systems Ok, that makes sense to me. I guess that there is not a lot of searching for the rgrps going on, which is why we are not seeing a big gain in the rgrplvb case. If the fs was nearly full perhaps we'd see more of a difference in that case. Either way, provided the rgrplvb code can continue to work, that is really the important thing, so I think we should be ok with this. I'd still very much like to see what can be done to reduce the page look up cost more directly though, since that seems to be the real issue here, Steve.