Re: [Cluster-devel] [GFS2 PATCH] GFS2: Don't brelse rgrp buffer_heads every allocation

2015-06-16 Thread Bob Peterson
- Original Message -
 I'll see if I can track down why the rgrplvb option isn't performing as well.
 I suspect the matter goes back to my first comment above. Namely, that the
 slowdown goes back to the slowness of page cache lookup for the buffers of
 the
 rgrps we are using (not rejected ones).

Hi,

I did verify that the vast majority of time is spent doing page cache lookups,
in function gfs2_getbuf(). The time spent in gfs2_meta_read() and 
gfs2_meta_wait()
are minimal--almost nil--presumably because the page is already in cache.

The rgrplvb option isn't improving this because it still calls gfs2_rgrp_bh_get
which still does all the page cache lookups for the rgrp we need. The rgrplvb
option should, in theory, save us a lot of time when searching for a suitable
rgrp, especially when there are multiple nodes doing allocations. 

I can't think of any reason why my patch would be incompatible with rgrplvb,
but I'd like to ask Ben Marzinski to scrutinize my patch carefully and see
if he can find any flaws in the design.

Regards,

Bob Peterson
Red Hat File Systems



[Cluster-devel] [GFS2 PATCH] GFS2: Don't add all glocks to the lru

2015-06-16 Thread Bob Peterson
Hi,

Regarding my previously posted patch:
I decided it makes more sense not to single out rgrp glocks for
exclusion from the lru list. It makes sense to also exclude the
transaction glock, and non-disk glocks off the lru list as well.
Therefore, I changed this to a generic glops flag so we could
specify them on an individual basis.

Patch description:

The glocks used for resource groups often come and go hundreds of
thousands of time per second. Adding them to the lru list just
adds unnecessary contention for the lru_lock spin_lock, especially
considering we're almost certainly going to re-use the glock and
take it back off the lru microseconds later. We never want the
glock shrinker to cull them anyway. This patch adds a new bit in
the glops that determines which glock types get put onto the lru
list and which ones don't.

Regards,

Bob Peterson

Signed-off-by: Bob Peterson rpete...@redhat.com
---
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 0fa8062..a38e38f 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1076,7 +1076,8 @@ void gfs2_glock_dq(struct gfs2_holder *gh)
!test_bit(GLF_DEMOTE, gl-gl_flags))
fast_path = 1;
}
-   if (!test_bit(GLF_LFLUSH, gl-gl_flags)  demote_ok(gl))
+   if (!test_bit(GLF_LFLUSH, gl-gl_flags)  demote_ok(gl) 
+   (glops-go_flags  GLOF_LRU))
gfs2_glock_add_to_lru(gl);
 
trace_gfs2_glock_queue(gh, 0);
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index fe91951..1249b2b 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -561,7 +561,7 @@ const struct gfs2_glock_operations gfs2_inode_glops = {
.go_lock = inode_go_lock,
.go_dump = inode_go_dump,
.go_type = LM_TYPE_INODE,
-   .go_flags = GLOF_ASPACE,
+   .go_flags = GLOF_ASPACE | GLOF_LRU,
 };
 
 const struct gfs2_glock_operations gfs2_rgrp_glops = {
@@ -584,10 +584,12 @@ const struct gfs2_glock_operations gfs2_freeze_glops = {
 const struct gfs2_glock_operations gfs2_iopen_glops = {
.go_type = LM_TYPE_IOPEN,
.go_callback = iopen_go_callback,
+   .go_flags = GLOF_LRU,
 };
 
 const struct gfs2_glock_operations gfs2_flock_glops = {
.go_type = LM_TYPE_FLOCK,
+   .go_flags = GLOF_LRU,
 };
 
 const struct gfs2_glock_operations gfs2_nondisk_glops = {
@@ -596,7 +598,7 @@ const struct gfs2_glock_operations gfs2_nondisk_glops = {
 
 const struct gfs2_glock_operations gfs2_quota_glops = {
.go_type = LM_TYPE_QUOTA,
-   .go_flags = GLOF_LVB,
+   .go_flags = GLOF_LVB | GLOF_LRU,
 };
 
 const struct gfs2_glock_operations gfs2_journal_glops = {
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 304a223..a1ec7c2 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -225,6 +225,7 @@ struct gfs2_glock_operations {
const unsigned long go_flags;
 #define GLOF_ASPACE 1
 #define GLOF_LVB2
+#define GLOF_LRU4
 };
 
 enum {



Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-16 Thread Guoqing Jiang
Hi David,

David Teigland wrote:
 On Thu, Jun 11, 2015 at 05:47:28PM +0800, Guoqing Jiang wrote:
   
 Do you consider take the following clean up? If yes, I will send a
 formal patch, otherwise pls ignore it.
 

 On first glance, the old and new code do not appear to do the same thing,
 so let's leave it as it is.

   
 -   to_nodeid = dlm_dir_nodeid(r);
 
Sorry, seems it is the only different thing, if combines previous change
with below modification, then the behavior is same.

@@ -3644,7 +3644,10 @@ static int send_common(struct dlm_rsb *r, struct
dlm_lkb *lkb, int mstype)
struct dlm_mhandle *mh;
int to_nodeid, error;
 
-   to_nodeid = r-res_nodeid;
+   if (mstype == DLM_MSG_LOOKUP)
+   to_nodeid = dlm_dir_nodeid(r);
+   else
+   to_nodeid = r-res_nodeid;

And for create_message, the second parameter (lkb) is not effective to
create three type msgs (REQUEST/LOOKUP/REMOVE).

Thanks,
Guoqing



Re: [Cluster-devel] [GFS2 PATCH] GFS2: Don't brelse rgrp buffer_heads every allocation

2015-06-16 Thread Steven Whitehouse

Hi,

On 15/06/15 15:43, Bob Peterson wrote:

- Original Message -

I'm assuming that these figures are bandwidth rather than times, since
that appears to show that the patch makes quite a large difference.
However the reclen is rather small. In the 32 bytes case, thats 128
writes for each new block thats being allocated, unless of course that
is 32k?

Steve.

Hi,

To do this test, I'm executing this command:
numactl --cpunodebind=0 --membind=0 /home/bob/iozone/iozone3_429/src/current/iozone 
-az -f /mnt/gfs2/iozone-gfs2 -n 2048m -g 2048m -y 32k -q 1m -e -i 0 -+n  
/home/bob/iozone.out

According to iozone -h, specifying -y this way is 32K, not 32 bytes.
The -q is maximum write size, in KB, so -q 1m is 1MB writes.
The -g is maximum file size, in KB, so -g 2048 is a 2MB file.

So the test varies the writes from 32K to 1MB, adjusting the number
of writes to get the file to 2MB.

Regards,

Bob Peterson
Red Hat File Systems


Ok, that makes sense to me. I guess that there is not a lot of searching 
for the rgrps going on, which is why we are not seeing a big gain in the 
rgrplvb case. If the fs was nearly full perhaps we'd see more of a 
difference in that case.


Either way, provided the rgrplvb code can continue to work, that is 
really the important thing, so I think we should be ok with this. I'd 
still very much like to see what can be done to reduce the page look up 
cost more directly though, since that seems to be the real issue here,


Steve.