[Cluster-devel] [PATCH 0/5] dlm and gfs2 patches for 3.3

2012-01-05 Thread David Teigland
in userland. This new feature is not used by current dlm_controld and gfs_controld daemons, but will be enabled by a new dlm_controld version under development. Bob Peterson (1): dlm: convert rsb list to rb_tree David Teigland (4): dlm: move recovery barrier calls dlm: add node slots

[Cluster-devel] [PATCH 3/5] dlm: add node slots and generation

2012-01-05 Thread David Teigland
slot number. A new generation number is also added to a lockspace. It is set and incremented during each recovery along with the slot collection/assignment. The slot numbers will be passed to gfs2 which will use them as journal id's. Signed-off-by: David Teigland teigl...@redhat.com --- fs/dlm

[Cluster-devel] [PATCH 2/5] dlm: move recovery barrier calls

2012-01-05 Thread David Teigland
Put all the calls to recovery barriers in the same function to clarify where they each happen. Should not change any behavior. Also modify some recovery debug lines to make them consistent. Signed-off-by: David Teigland teigl...@redhat.com --- fs/dlm/dir.c |1 - fs/dlm/member.c

[Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2012-01-05 Thread David Teigland
to track journals that need recovery Signed-off-by: David Teigland teigl...@redhat.com --- fs/gfs2/glock.c |2 +- fs/gfs2/glock.h |7 +- fs/gfs2/incore.h| 58 +++- fs/gfs2/lock_dlm.c | 993 ++- fs/gfs2

[Cluster-devel] [PATCH 4/5] dlm: add recovery callbacks

2012-01-05 Thread David Teigland
their kernel counterparts. These callbacks allow the same coordination directly, and more simply. Signed-off-by: David Teigland teigl...@redhat.com --- fs/dlm/config.c | 130 ++-- fs/dlm/config.h | 17 +++- fs/dlm/dlm_internal.h | 21 ++ fs/dlm

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2012-01-05 Thread David Teigland
On Thu, Jan 05, 2012 at 04:58:22PM +, Steven Whitehouse wrote: + clear_bit(SDF_NOJOURNALID, sdp-sd_flags); + smp_mb__after_clear_bit(); + wake_up_bit(sdp-sd_flags, SDF_NOJOURNALID); + ls-ls_first = !!test_bit(DFL_FIRST_MOUNT, ls-ls_recover_flags); + return 0; + This

Re: [Cluster-devel] [patch] gfs2: make some sizes unsigned in set_recover_size()

2012-01-04 Thread David Teigland
[patch] dlm: le32 vs le16 gfs2: make some sizes unsigned in set_recover_size() Thanks, I've folded in both of those. Dave

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2011-12-22 Thread David Teigland
On Mon, Dec 19, 2011 at 12:47:38PM -0500, David Teigland wrote: On Mon, Dec 19, 2011 at 01:07:38PM +, Steven Whitehouse wrote: struct lm_lockstruct { int ls_jid; unsigned int ls_first; - unsigned int ls_first_done; unsigned int ls_nodir; Since ls_flags and ls_first

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2011-12-21 Thread David Teigland
On Wed, Dec 21, 2011 at 10:45:21AM +, Steven Whitehouse wrote: I don't think I understand whats going on in that case. What I thought should be happening was this: - Try to get mounter lock in EX - If successful, then we are the first mounter so recover all journals - Write

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2011-12-20 Thread David Teigland
On Tue, Dec 20, 2011 at 10:39:08AM +, Steven Whitehouse wrote: I dislike arbitrary delays also, so I'm hesitant to add them. The choices here are: - removing NOQUEUE from the requests below, but with NOQUEUE you have a much better chance of killing a mount command, which is a fairly

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2011-12-20 Thread David Teigland
On Tue, Dec 20, 2011 at 02:16:43PM -0500, David Teigland wrote: On Tue, Dec 20, 2011 at 10:39:08AM +, Steven Whitehouse wrote: I dislike arbitrary delays also, so I'm hesitant to add them. The choices here are: - removing NOQUEUE from the requests below, but with NOQUEUE you have

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2011-12-19 Thread David Teigland
On Mon, Dec 19, 2011 at 01:07:38PM +, Steven Whitehouse wrote: struct lm_lockstruct { int ls_jid; unsigned int ls_first; - unsigned int ls_first_done; unsigned int ls_nodir; Since ls_flags and ls_first also also only boolean flags, they could potentially be moved

Re: [Cluster-devel] [PATCH 3/5] dlm: add node slots and generation

2011-12-19 Thread David Teigland
Nit, but this should have some spaces, iow, i + 1; -error = check_config(ls, rc, nodeid); +error = check_rcom_config(ls, rc, nodeid); yeah, I'll change those, thanks

Re: [Cluster-devel] [PATCH 4/5] dlm: add recovery callbacks

2011-12-19 Thread David Teigland
On Mon, Dec 19, 2011 at 12:36:57PM +, Steven Whitehouse wrote: + struct dlm_lockspace_ops ls_ops; ^^ I'd suggest just keeping a pointer to this, see below. +static int new_lockspace(const char *name, const char *cluster, uint32_t flags, +

[Cluster-devel] dlm patches

2011-12-16 Thread David Teigland
This is the current series of dlm patches from https://github.com/teigland/linux-dlm/tree/devel9 The first is already pushed to linux-next for the next merge cycle. The others, which allow gfs2 to be used without gfs_controld, are still being tested, and may be ready for the next merge cycle,

[Cluster-devel] [PATCH 3/5] dlm: add node slots and generation

2011-12-16 Thread David Teigland
slot number. A new generation number is also added to a lockspace. It is set and incremented during each recovery along with the slot collection/assignment. The slot numbers will be passed to gfs2 which will use them as journal id's. Signed-off-by: David Teigland teigl...@redhat.com --- fs/dlm

[Cluster-devel] [PATCH 1/5] dlm: convert rsb list to rb_tree

2011-12-16 Thread David Teigland
-by: David Teigland teigl...@redhat.com --- fs/dlm/debug_fs.c | 28 --- fs/dlm/dlm_internal.h |9 +++-- fs/dlm/lock.c | 87 +++- fs/dlm/lockspace.c| 23 + fs/dlm/recover.c | 21 +++ 5 files

[Cluster-devel] [PATCH 2/5] dlm: move recovery barrier calls

2011-12-16 Thread David Teigland
Put all the calls to recovery barriers in the same function to clarify where they each happen. Should not change any behavior. Also modify some recovery debug lines to make them consistent. Signed-off-by: David Teigland teigl...@redhat.com --- fs/dlm/dir.c |1 - fs/dlm/member.c

[Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2011-12-16 Thread David Teigland
to track journals that need recovery Signed-off-by: David Teigland teigl...@redhat.com --- fs/gfs2/glock.c |2 +- fs/gfs2/glock.h |7 +- fs/gfs2/incore.h| 51 ++- fs/gfs2/lock_dlm.c | 979 ++- fs/gfs2

[Cluster-devel] [PATCH 4/5] dlm: add recovery callbacks

2011-12-16 Thread David Teigland
their kernel counterparts. These callbacks allow the same coordination directly, and more simply. Signed-off-by: David Teigland teigl...@redhat.com --- fs/dlm/config.c | 130 ++-- fs/dlm/config.h | 17 +++- fs/dlm/dlm_internal.h | 20 ++ fs/dlm

Re: [Cluster-devel] GFS2: glock statistics gathering (RFC)

2011-11-04 Thread David Teigland
On Fri, Nov 04, 2011 at 04:57:31PM +, Steven Whitehouse wrote: Hi, On Fri, 2011-11-04 at 12:31 -0400, David Teigland wrote: On Fri, Nov 04, 2011 at 03:19:49PM +, Steven Whitehouse wrote: The three pairs of mean/variance measure the following things: 1. DLM lock time

Re: [Cluster-devel] [Upstream patch] DLM: Convert rsb data from linked list to rb_tree

2011-10-25 Thread David Teigland
Hi Bob, I've made a few minor/cosmetic changes and attached my current version (not tested yet). static int shrink_bucket(struct dlm_ls *ls, int b) { + struct rb_node *n = NULL; struct dlm_rsb *r; int count = 0, found; for (;;) { found = 0;

[Cluster-devel] cluster4 gfs_controld

2011-10-13 Thread David Teigland
Here's the outline of my plan to remove/replace the essential bits of gfs_controld in cluster4. I expect it'll go away entirely, but there could be one or two minor things it would still handle on the side. kernel dlm/gfs2 will continue to be operable with either . cluster3

Re: [Cluster-devel] cluster4 gfs_controld

2011-10-13 Thread David Teigland
On Thu, Oct 13, 2011 at 03:41:31PM +0100, Steven Whitehouse wrote: cluster4 . jid from dlm-kernel slots which will be assigned similarly What is the actual algorithm used to assign these slots? The same as picking jids: lowest unused id starting with 0. As for implementation, I'll add it to

Re: [Cluster-devel] cluster4 gfs_controld

2011-10-13 Thread David Teigland
On Thu, Oct 13, 2011 at 05:16:29PM +0100, Steven Whitehouse wrote: Hi, On Thu, 2011-10-13 at 11:30 -0400, David Teigland wrote: On Thu, Oct 13, 2011 at 03:41:31PM +0100, Steven Whitehouse wrote: cluster4 . jid from dlm-kernel slots which will be assigned similarly What

Re: [Cluster-devel] [Upstream patch] DLM: Convert rsb data from linked list to rb_tree

2011-10-10 Thread David Teigland
On Sat, Oct 08, 2011 at 06:13:52AM -0400, Bob Peterson wrote: - Original Message - | On Wed, Oct 05, 2011 at 03:25:39PM -0400, Bob Peterson wrote: | Hi, | | This upstream patch changes the way DLM keeps track of RSBs. | Before, they were in a linked list off a hash table. Now,

Re: [Cluster-devel] [coverity] liblogthread

2011-10-10 Thread David Teigland
On Mon, Oct 10, 2011 at 10:45:17AM +0200, Fabio M. Di Nitto wrote: This is the first patchset to address some issues spotted by Coverity scan. look fine

Re: [Cluster-devel] [Upstream patch] DLM: Convert rsb data from linked list to rb_tree

2011-10-10 Thread David Teigland
On Mon, Oct 10, 2011 at 04:51:01PM +0100, Steven Whitehouse wrote: Hi, On Mon, 2011-10-10 at 10:43 -0400, David Teigland wrote: On Sat, Oct 08, 2011 at 06:13:52AM -0400, Bob Peterson wrote: - Original Message - | On Wed, Oct 05, 2011 at 03:25:39PM -0400, Bob Peterson wrote

Re: [Cluster-devel] [Upstream patch] DLM: Convert rsb data from linked list to rb_tree

2011-10-10 Thread David Teigland
On Mon, Oct 10, 2011 at 08:00:07PM +0100, Steven Whitehouse wrote: The fact remains that caching as much as possible tends to be harmful, and some careful limiting would be a good investment. There is a limit. The point is that the limit is dynamic and depends on memory pressure. the as

Re: [Cluster-devel] [Upstream patch] DLM: Convert rsb data from linked list to rb_tree

2011-10-05 Thread David Teigland
On Wed, Oct 05, 2011 at 03:25:39PM -0400, Bob Peterson wrote: Hi, This upstream patch changes the way DLM keeps track of RSBs. Before, they were in a linked list off a hash table. Now, they're an rb_tree off the same hash table. This speeds up DLM lookups greatly. Today's DLM is faster

Re: [Cluster-devel] dlm: master - dlm: clear out old stuff and build system

2011-10-03 Thread David Teigland
dlm/libdlm/libdlm.pc.in | 11 - dlm/libdlm/libdlm_lt.pc.in| 11 - dropping the .pc file is going to break dlm users. pc files are used by different build systems (not just autotools/autoconf) to detect libdlm and link against it correctly. Similar to what

Re: [Cluster-devel] dlm: master - dlm_controld: remove ccs and new Makefile

2011-09-30 Thread David Teigland
On Fri, Sep 30, 2011 at 01:07:01PM +0200, Fabio M. Di Nitto wrote: On 09/30/2011 12:02 AM, David Teigland wrote: add a normal, sane Makefile If you plan to drop autoconf+autotool, that is your call (I disagree for several reasons, but it's your project and I am not going to argue

[Cluster-devel] [PATCH 2/3] dlm_controld: full check for member changes

2011-09-28 Thread David Teigland
the incarnation numbers of members from consecutive queries to avoid this. bz 663397 Signed-off-by: David Teigland teigl...@redhat.com --- group/dlm_controld/member_cman.c | 79 -- 1 files changed, 75 insertions(+), 4 deletions(-) diff --git a/group

[Cluster-devel] [PATCH 1/3] fenced: full check for member changes

2011-09-28 Thread David Teigland
the incarnation numbers of members from consecutive queries to avoid this. bz 663397 Signed-off-by: David Teigland teigl...@redhat.com --- fence/fenced/member_cman.c | 36 1 files changed, 36 insertions(+), 0 deletions(-) diff --git a/fence/fenced

[Cluster-devel] [PATCH 3/3] gfs_controld: full check for member changes

2011-09-28 Thread David Teigland
the incarnation numbers of members from consecutive queries to avoid this. bz 663397 Signed-off-by: David Teigland teigl...@redhat.com --- group/gfs_controld/member_cman.c | 51 +++--- 1 files changed, 47 insertions(+), 4 deletions(-) diff --git a/group

Re: [Cluster-devel] dlm: master - dlm_controld: new plock state transfer

2011-09-26 Thread David Teigland
On Sat, Sep 24, 2011 at 07:13:34AM +0200, Fabio M. Di Nitto wrote: Quick question.. deadlock.c/netlink.c have been dropped from the build and not referenced anywhere for distribution. Is it a plan to kill them completely or do they need porting? I'm going to leave the files there as artifacts

Re: [Cluster-devel] DLM + SCTP bug (was Re: [DRBD-user] kernel panic with DRBD: solved)

2011-09-12 Thread David Teigland
When node A starts back up, the SCTP protocol notices this (as it?s supposed to), and delivers an SCTP_ASSOC_CHANGE / SCTP_RESTART notification to the SCTP socket, telling the socket owner (the dlm_recv thread) that the other node has restarted. DLM responds by telling SCTP to create a

Re: [Cluster-devel] [PATCH 4/5] gfs_controld: Remove dead code from loop()

2011-09-06 Thread David Teigland
On Tue, Sep 06, 2011 at 01:00:16PM +0100, Andrew Price wrote: This patch removes an if statement where the true branch is never taken. At this point in the code, poll_timeout could only be 500 or -1. Signed-off-by: Andrew Price anpr...@redhat.com --- group/gfs_controld/main.c |3 ---

[Cluster-devel] [PATCH] dlm_controld: fix plock dev_write no op

2011-08-18 Thread David Teigland
). This is because the kernel generates extraneous plock unlock requests when files are closed with flocks. Because dlm_controld finds no plocks on the files, it replies to the kernel with an error, rather than skipping the reply to do CLOSE. bz 731775 Signed-off-by: David Teigland teigl...@redhat.com

Re: [Cluster-devel] [RFT] dlm: replace lkb hash table with idr

2011-07-08 Thread David Teigland
On Wed, Jul 06, 2011 at 12:14:26PM -0400, David Teigland wrote: Request for testing I'm looking at possible improvements to the dlm hash tables. I've pushed this and another patch related to hash table performance to the tmp-testing branch, git://git.kernel.org/pub/scm/linux/kernel/git

[Cluster-devel] [RFT] dlm: replace lkb hash table with idr

2011-07-06 Thread David Teigland
Request for testing I'm looking at possible improvements to the dlm hash tables. This patch keeps lkbs in an idr instead of a hash table. Before pushing this patch further, I'd like to know if it makes any difference in environments using millions of locks on each node. From: David Teigland

Re: [Cluster-devel] [PATCH] fs, dlm: Don't leak, don't do pointless NULL checks and use kzalloc

2011-06-29 Thread David Teigland
On Wed, Jun 29, 2011 at 11:09:27PM +0200, Jesper Juhl wrote: In fs/dlm/lock.c in the dlm_scan_waiters() function there are 3 small issues: 1) first time through the loop we allocate memory for 'warned', if we then (in the loop) don't take the if (!warned) path and loop again, the second

Re: [Cluster-devel] [PATCH] fs, dlm: Don't leak, don't do pointless NULL checks and use kzalloc

2011-06-29 Thread David Teigland
On Wed, Jun 29, 2011 at 11:51:00PM +0200, Jesper Juhl wrote: I don't think so; num_nodes won't be set to zero. Hmm. How so? Maybe I'm missing something obvious, but; num_nodes is initialized to zero at the beginning of the function, which means that we'll definately do the first

Re: [Cluster-devel] [PATCH 0/3] dlm_controld: Improvement on plock searching efficiency

2011-05-27 Thread David Teigland
On Fri, May 27, 2011 at 07:44:03AM +0800, Jiaju Zhang wrote: This series introduces a RB tree for improving plock resources searching efficiency. We met this performance issue when running Samba on top of cluster filesystem, profiling during nbench runs with num-progs=500, the dlm_controld

[Cluster-devel] dlm_controld: clear waiting plocks for closed files

2011-05-23 Thread David Teigland
when the process is killed. So the unlock-close also needs to clear any waiting plocks that were abandoned by the killed process. The corresponding kernel patch: https://lkml.org/lkml/2011/5/23/237 Signed-off-by: David Teigland teigl...@redhat.com --- group/dlm_controld/plock.c | 28

Re: [Cluster-devel] RFC: generic improvement to fence agents api

2011-03-21 Thread David Teigland
On Sat, Mar 19, 2011 at 07:34:55AM +0100, Fabio M. Di Nitto wrote: My suggestion would be to allow to specify a list of ports instead. This comes up now and then. The current rule of one action per agent execution is a tried and true, fundamental property of the agent api. It should not be

Re: [Cluster-devel] [PATCH] dlm: Reset fs_notified when check_fs_done

2011-02-22 Thread David Teigland
On Tue, Feb 22, 2011 at 04:35:42PM +0800, Jiaju Zhang wrote: On Tue, Nov 9, 2010 at 6:06 AM, David Teigland teigl...@redhat.com wrote: On Mon, Nov 08, 2010 at 11:05:49PM +0800, Jiaju Zhang wrote: Luckily, things have changed now. One user met this issue two months ago and he's also very

[Cluster-devel] fenced: don't ignore victim_done messages for reduced victims

2011-02-22 Thread David Teigland
dlm_controld to wait indefinately for fencing to complete for the reduced victim. The fix is to simply record the information from a victim_done message even if the node is not in the victims list. bz 678704 Signed-off-by: David Teigland teigl...@redhat.com --- fence/fenced/cpg.c | 18

Re: [Cluster-devel] [PATCH] fenced: send dbus signal when node is fenced

2011-02-03 Thread David Teigland
+void dbus_init (void) No space before ( Also, it would be a good idea to put a fenced-specific prefix before fenced's own dbus functions, e.g. fd_dbus_init(), because dbus_ is the dbus lib's namespace and open to symbol collisions. +{ +#ifdef DBUS + +if (!(bus = dbus_bus_get_private

Re: [Cluster-devel] [PATCH] fenced: send dbus signal when node is fenced

2011-02-03 Thread David Teigland
On Thu, Feb 03, 2011 at 01:26:07PM -0600, Ryan O'Hara wrote: This patch adds the ability to send a dbus signal when a node is fenced. This code is can reestablish a connection with dbus if necessary. ACK

Re: [Cluster-devel] [PATCH] dlm: send_bast_queue() skip list loop not only sending basts to convertqueue

2011-01-04 Thread David Teigland
On Tue, Jan 04, 2011 at 06:06:51PM -0200, cmaiol...@redhat.com wrote: The resource groups got corrupted without this patch: I could see an extraneous bast leading to confusion in gfs2 about the lock state, but gfs2 should probably be asserting somewhere before it actually corrupts anything...

Re: [Cluster-devel] [PATCH] dlm: sanitize work_start() in lowcomms.c

2010-12-13 Thread David Teigland
On Tue, Dec 14, 2010 at 12:28:25AM +0900, Namhyung Kim wrote: The create_workqueue() returns NULL if failed rather than ERR_PTR(). Fix error checking and remove unnecessary variable 'error'. I adapted this to the alloc_workqueue patch in next and pushed to next. Dave

Re: [Cluster-devel] Patch: making DLM more robust

2010-12-01 Thread David Teigland
On Wed, Dec 01, 2010 at 10:23:25AM +0100, Menyhart Zoltan wrote: If we cannot obtain a given resource within a limited time frame, then it is a real error for the customer: s/he cannot mount an OCFS2 volume, cannot issue a cluster command, etc. Matter of opinion and preference I suppose. 2.

Re: [Cluster-devel] Patch: making DLM more robust

2010-11-30 Thread David Teigland
On Tue, Nov 30, 2010 at 05:57:50PM +0100, Menyhart Zoltan wrote: Hi, An easy first step to make DLM more robust can be adding a time out protection to the lock space cration operation, while waiting for a dlm_controld action. A new memeber ci_dlm_controld_secs is added to dlm_config to set

Re: [Cluster-devel] -ls_in_recovery not released

2010-11-23 Thread David Teigland
On Tue, Nov 23, 2010 at 03:58:42PM +0100, Menyhart Zoltan wrote: David Teigland wrote: On Mon, Nov 22, 2010 at 05:31:25PM +0100, Menyhart Zoltan wrote: We have got a two-node OCFS2 file system controlled by the pacemaker. Are you using dlm_controld.pcmk? Yes. If so, please try

Re: [Cluster-devel] -ls_in_recovery not released

2010-11-22 Thread David Teigland
On Mon, Nov 22, 2010 at 05:31:25PM +0100, Menyhart Zoltan wrote: We have got a two-node OCFS2 file system controlled by the pacemaker. Are you using dlm_controld.pcmk? If so, please try the latest versions of pacemaker that use the standard dlm_controld. The problem may be related to the

Re: [Cluster-devel] dlm: Use cmwq for send and receive workqueues

2010-11-12 Thread David Teigland
On Fri, Nov 12, 2010 at 12:12:29PM +, Steven Whitehouse wrote: So far as I can tell, there is no reason to use a single-threaded send workqueue for dlm, since it may need to send to several sockets concurrently. Both workqueues are set to WQ_MEM_RECLAIM to avoid any possible deadlocks,

Re: [Cluster-devel] dlm: Use cmwq for send and receive workqueues

2010-11-12 Thread David Teigland
On Fri, Nov 12, 2010 at 04:20:35PM +, Steven Whitehouse wrote: Hi, On Fri, 2010-11-12 at 11:12 -0500, David Teigland wrote: On Fri, Nov 12, 2010 at 12:12:29PM +, Steven Whitehouse wrote: So far as I can tell, there is no reason to use a single-threaded send workqueue

Re: [Cluster-devel] [PATCH] dlm: Reset fs_notified when check_fs_done

2010-11-08 Thread David Teigland
On Mon, Nov 08, 2010 at 11:05:49PM +0800, Jiaju Zhang wrote: Luckily, things have changed now. One user met this issue two months ago and he's also very kindly to test the patch. The result is the patch really works. Attached is the log before they apply the patch. This time the log has

Re: [Cluster-devel] dlm: Don't send callback to node making lock request when try 1cb fails

2010-09-03 Thread David Teigland
to an inode whose glock is cached on another, idle (wrt that glock) node. (comment added, dct) Signed-off-by: Steven Whitehouse swhit...@redhat.com Tested-by: Abhijith Das a...@redhat.com Signed-off-by: David Teigland teigl...@redhat.com --- fs/dlm/lock.c |3 +++ 1 files changed, 3 insertions

Re: [Cluster-devel] [PATCH] dlm: use genl_register_family_with_ops()

2010-07-26 Thread David Teigland
On Mon, Jul 26, 2010 at 05:19:19PM +0800, Changli Gao wrote: Signed-off-by: Changli Gao xiao...@gmail.com fs/dlm/netlink.c | 15 +-- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/fs/dlm/netlink.c b/fs/dlm/netlink.c index 2c6ad51..ef17e01 100644 ---

Re: [Cluster-devel] GFS2: Wait for journal id on mount if not specified on mount command line

2010-06-08 Thread David Teigland
On Tue, Jun 08, 2010 at 09:34:52AM +0100, Steven Whitehouse wrote: A couple obvious questions from the start... - What if gfs_controld isn't running? It will hang until mount is killed, where upon it will clean up and exit gracefully. Right, so instead of failing with an error, it hangs and

Re: [Cluster-devel] GFS2: Wait for journal id on mount if not specified on mount command line

2010-06-07 Thread David Teigland
On Mon, Jun 07, 2010 at 04:39:09PM +0100, Steven Whitehouse wrote: This patch implements a wait for the journal id in the case that it has not been specified on the command line. This is to allow the future removal of the mount.gfs2 helper. The journal id would instead be directly

[Cluster-devel] [PATCH 0/2] dlm patches for 2.6.35

2010-05-19 Thread David Teigland
These are the dlm patches from -next for the 2.6.35 merge. Dan Carpenter (1): dlm: cleanup remove unused code David Teigland (1): dlm: fix ast ordering for user locks fs/dlm/lock.c |5 +-- fs/dlm/user.c | 88 +++- 2 files

[Cluster-devel] [PATCH 1/2] dlm: cleanup remove unused code

2010-05-19 Thread David Teigland
. Signed-off-by: Dan Carpenter erro...@gmail.com Signed-off-by: David Teigland teigl...@redhat.com --- fs/dlm/lock.c |5 + 1 files changed, 1 insertions(+), 4 deletions(-) diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index 17903b4..031dbe3 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -733,10

[Cluster-devel] [PATCH 2/2] dlm: fix ast ordering for user locks

2010-05-19 Thread David Teigland
Commit 7fe2b3190b8b299409f13cf3a6f85c2bd371f8bb fixed possible misordering of completion asts (casts) and blocking asts (basts) for kernel locks. This patch does the same for locks taken by user space applications. Signed-off-by: David Teigland teigl...@redhat.com --- fs/dlm/user.c | 88

Re: [Cluster-devel] [PATCH] gfs2: Fix handling of mount points with spaces

2010-04-26 Thread David Teigland
On Mon, Apr 26, 2010 at 04:09:25PM -0400, Lon Hohberger wrote: Mount points may contain spaces, tabs, newlines, and the backslash character according to getmntent(3). Unfortunately, while scanning /proc/mounts, mount.gfs2 was not unescaping the escape sequences, causing mount.gfs2 to not

[Cluster-devel] [PATCH] gfs2: add nfslocks mount option

2010-03-25 Thread David Teigland
Using the nfslocks mount option causes gfs2 to pass posix locks from nfs to the dlm to be clustered. It is off by default because posix locks from nfs are not fully handled by the cluster. Signed-off-by: David Teigland teigl...@redhat.com --- fs/gfs2/file.c | 11 +++ fs/gfs2

Re: [Cluster-devel] [patch] dlm: cleanup remove unused code

2010-03-22 Thread David Teigland
On Mon, Mar 22, 2010 at 03:03:54PM +0300, Dan Carpenter wrote: Smatch complains because lkb is never NULL. Looking at it, the original code actually adds the new element to the end of the list fine, so we can just get rid of the if condition. This code is four years old and no one has

[Cluster-devel] [PATCH 2/4] dlm: send reply before bast

2010-02-26 Thread David Teigland
locks that can now be granted The cast is queued on the local node when the reply from the lock master is received. This means that a lock holder can receive a bast for a lock mode that is doesn't yet know has been granted. Signed-off-by: David Teigland teigl...@redhat.com --- fs/dlm/lock.c | 110

[Cluster-devel] [PATCH 3/4] dlm: Send lockspace name with uevents

2010-02-26 Thread David Teigland
From: Steven Whitehouse swhit...@redhat.com Although it is possible to get this information from the path, its much easier to provide the lockspace as a seperate env variable. Signed-off-by: Steven Whitehouse swhit...@redhat.com Signed-off-by: David Teigland teigl...@redhat.com --- fs/dlm

[Cluster-devel] [PATCH 0/4] dlm patches for 2.6.34

2010-02-26 Thread David Teigland
/dlm/user.c | 10 +++-- fs/dlm/user.h |4 +- 8 files changed, 180 insertions(+), 58 deletions(-) David Teigland (3): dlm: fix ordering of bast and cast dlm: send reply before bast dlm: use bastmode in debugfs output Steven Whitehouse (1): dlm: Send

[Cluster-devel] [PATCH 4/4] dlm: use bastmode in debugfs output

2010-02-26 Thread David Teigland
that value in the debugfs print. Signed-off-by: David Teigland teigl...@redhat.com --- fs/dlm/debug_fs.c |2 +- fs/dlm/lock.c |6 -- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/dlm/debug_fs.c b/fs/dlm/debug_fs.c index 375a235..29d6139 100644 --- a/fs/dlm/debug_fs.c

[Cluster-devel] [PATCH 1/4] dlm: fix ordering of bast and cast

2010-02-26 Thread David Teigland
queues a cast immediately after sending the demote message. In this way a cast can be queued for a mode, e.g. NL, that makes an in-transit bast extraneous. Signed-off-by: David Teigland teigl...@redhat.com --- fs/dlm/ast.c | 74 ++-- fs/dlm

Re: [Cluster-devel] [PATCH 2/2] dlm: Remove obsolete lockspace lookup

2010-02-18 Thread David Teigland
On Thu, Feb 18, 2010 at 09:16:03AM +, Steven Whitehouse wrote: I'm not sure what more I can say here this is a sysfs file store function and one of the reasons for using it is that sysfs looks after the ref counting for you. Even aside from that, if you don't have a reference to the

Re: [Cluster-devel] [PATCH 2/2] dlm: Remove obsolete lockspace lookup

2010-02-17 Thread David Teigland
On Wed, Feb 17, 2010 at 09:41:35AM +, Steven Whitehouse wrote: We don't need to look up the lockspace in this particular case since we already have a pointer to it (which was being dereferenced in order to do the lookup in the first place). It'll take more to convince me that that

Re: [Cluster-devel] dlm: Remove/bypass astd

2010-02-17 Thread David Teigland
On Wed, Feb 17, 2010 at 01:23:39PM +, Steven Whitehouse wrote: While investigating Red Hat bug #537010 I started looking at the dlm's astd thread. The way in which the cast and bast requests are queued looked as if it might cause reordering since the bast requests are always delivered

Re: [Cluster-devel] skipping unused services, cman_tool join -A

2010-01-28 Thread David Teigland
On Thu, Jan 28, 2010 at 09:11:06AM +, Christine Caulfield wrote: So, let me get this straight. For a cman upgrade you want to load the same services as before. But for a new full-featured RHEL6 cluster suite you think we should load fewer services, thus removing features ? Exactly,

Re: [Cluster-devel] skipping unused services, cman_tool join -A

2010-01-27 Thread David Teigland
On Fri, Sep 25, 2009 at 02:27:52PM -0500, David Teigland wrote: To avoid loading+running services that you don't use (e.g. to avoid bugs crashing the system from a service you're not using) add to cluster.conf service name=corosync_cman ver=0/ service name=openais_ckpt ver=0/ run

Re: [Cluster-devel] skipping unused services, cman_tool join -A

2010-01-27 Thread David Teigland
On Wed, Jan 27, 2010 at 01:51:38PM -0700, Steven Dake wrote: On Wed, 2010-01-27 at 14:49 -0600, David Teigland wrote: On Fri, Sep 25, 2009 at 02:27:52PM -0500, David Teigland wrote: To avoid loading+running services that you don't use (e.g. to avoid bugs crashing the system from a service

[Cluster-devel] cluster3 man pages

2010-01-21 Thread David Teigland
In the process of updating content in my cluster3 man pages, I also tried to align the style/structure/etc with this stanard: http://www.kernel.org/doc/man-pages/online/pages/man7/man-pages.7.html I think it would be nice to try to sync up all our man pages with those conventions, it's been

Re: [Cluster-devel] cluster3 man pages

2010-01-21 Thread David Teigland
On Thu, Jan 21, 2010 at 05:52:14PM +0100, Fabio M. Di Nitto wrote: On 1/21/2010 5:28 PM, David Teigland wrote: In the process of updating content in my cluster3 man pages, I also tried to align the style/structure/etc with this stanard: http://www.kernel.org/doc/man-pages/online/pages

[Cluster-devel] fixing cpg names in 3.0.8?

2010-01-12 Thread David Teigland
I just discovered that the cpg group names that fenced, dlm_controld and gfs_controld use in cluster3 include a nul within the name length. This can happen because cpg names have both a string part and a length part, and the length is currently being set to strlen + 1 instead of strlen. It's

Re: [Cluster-devel] fixing cpg names in 3.0.8?

2010-01-12 Thread David Teigland
On Tue, Jan 12, 2010 at 04:00:20PM +, Christine Caulfield wrote: On 12/01/10 16:24, David Teigland wrote: I just discovered that the cpg group names that fenced, dlm_controld and gfs_controld use in cluster3 include a nul within the name length. This can happen because cpg names have both

Re: [Cluster-devel] fencing conditions: what should trigger a fencing operation?

2009-11-19 Thread David Teigland
On Thu, Nov 19, 2009 at 04:15:58PM +, Steven Whitehouse wrote: Hi, On Thu, 2009-11-19 at 11:04 -0600, David Teigland wrote: On Thu, Nov 19, 2009 at 12:35:05PM +0100, Fabio M. Di Nitto wrote: - what are the current fencing policies? node failure I think what Fabio is asking

Re: [Cluster-devel] fencing conditions: what should trigger a fencing operation?

2009-11-19 Thread David Teigland
On Thu, Nov 19, 2009 at 07:10:54PM +0100, Fabio M. Di Nitto wrote: David Teigland wrote: On Thu, Nov 19, 2009 at 12:35:05PM +0100, Fabio M. Di Nitto wrote: - what are the current fencing policies? node failure - what can we do to improve them? node failure is a simple, black

[Cluster-devel] Re: [PATCH 2/2] dlm: Add down/up_write_non_owner to keep lockdep happy

2009-11-12 Thread David Teigland
On Thu, Nov 12, 2009 at 02:29:18PM +, Steven Whitehouse wrote: Hi, On Thu, 2009-11-12 at 15:22 +0100, Ingo Molnar wrote: * Steven Whitehouse swhit...@redhat.com wrote: I looked at possibly changing this to use completions, but it seems that the usage here is not easily adapted

[Cluster-devel] Re: [PATCH 2/2] dlm: Add down/up_write_non_owner to keep lockdep happy

2009-11-12 Thread David Teigland
On Thu, Nov 12, 2009 at 05:24:12PM +, Steven Whitehouse wrote: Nov 12 15:10:01 chywoon kernel: [ INFO: possible recursive locking detected ] That recursive locking trace is something different. up_write_non_owner() addresses this trace, which as you say, is from doing the down

Re: [Cluster-devel] Re: [PATCH] misc: use a proper range for minor number dynamic allocation

2009-11-09 Thread David Teigland
On Mon, Nov 09, 2009 at 01:28:36PM -0800, Andrew Morton wrote: On Fri, 23 Oct 2009 21:28:17 -0200 Thadeu Lima de Souza Cascardo casca...@holoscopio.com wrote: The current dynamic allocation of minor number for misc devices has some drawbacks. First of all, the range for dynamic

Re: [Cluster-devel] SCTP versus OpenAIS/corosync time-outs

2009-11-02 Thread David Teigland
On Mon, Nov 02, 2009 at 08:41:43AM +, Christine Caulfield wrote: To be honest, RRP DLM/SCTP is not well tested or used. There are probably lots of things that could be done to improve it. In particular the failover aspect of it (the most important part of course) has probably not been

Re: [Cluster-devel] mount.gfs2 hangs on cluster-3.0.3

2009-11-02 Thread David Teigland
On Sun, Nov 01, 2009 at 04:20:32PM +0200, Dan Candea wrote: connect(3, {sa_family=AF_FILE, path=@gfsc_sock...}, 12) = 0 write(3, \\o\\o\1\0\1\0\7\0\0\0\0\0\0\0`p\0\0\0\0\0\0\0\0\0\0\0\0\0\0s..., 28768) = 28768 read(3, Need to find out what gfs_controld is doing: gfs_control dump strace

[Cluster-devel] Re: [RFC PATCH] dlm: enhancing dlm_controld (pcmk) to be able to handle redundant rings

2009-10-23 Thread David Teigland
On Fri, Oct 23, 2009 at 09:23:20PM +0800, Jiaju Zhang wrote: +result = corosync_cfg_ring_status_get(handle, + interface_names, + interface_status, + interface_count); +if

Re: [Cluster-devel] Re: [RFC PATCH] dlm: enhancing dlm_controld (pcmk) to be able to handle redundant rings

2009-10-23 Thread David Teigland
On Fri, Oct 23, 2009 at 12:18:09PM -0700, Steven Dake wrote: On Fri, 2009-10-23 at 12:55 -0500, David Teigland wrote: On Fri, Oct 23, 2009 at 09:23:20PM +0800, Jiaju Zhang wrote: +result = corosync_cfg_ring_status_get(handle, + interface_names

Re: [Cluster-devel] Re: gfs2-utils: master - gfs_controld: Remove three unused functions

2009-10-19 Thread David Teigland
On Mon, Oct 19, 2009 at 11:35:17AM +0100, Steven Whitehouse wrote: The question really is why we have all these (apparently) different ideas of cluster membership. Looking at gfs_controld itself, it uses two CPGs (one for all gfs_controlds which seems to only be used in negotiating the

Re: [Cluster-devel] Re: fence-agents: master - fencing: New option '--missing-as-off' to return OFF is machine is missing

2009-10-16 Thread David Teigland
On Fri, Oct 16, 2009 at 01:56:17PM +0200, Marek 'marx' Gr?c wrote: David Teigland wrote: fencing: New option '--missing-as-off' to return OFF is machine is missing If a blade is not present (i.e. removed for maintenance), the fence_bladecenter cannot check the state as it is reported empty

Re: [Cluster-devel] Re: gfs2-utils: master - gfs_controld: Remove three unused functions

2009-10-16 Thread David Teigland
On Fri, Oct 16, 2009 at 03:56:05PM +0100, Steven Whitehouse wrote: Hi, On Wed, 2009-10-14 at 12:53 -0500, David Teigland wrote: On Wed, Oct 14, 2009 at 02:55:04PM +, Steven Whitehouse wrote: gfs_controld: Remove three unused functions These functions are not called from

Re: [Cluster-devel] Re: gfs2-utils: master - gfs_controld: Remove three unused functions

2009-10-16 Thread David Teigland
On Fri, Oct 16, 2009 at 05:01:18PM +0100, Steven Whitehouse wrote: Hi, On Fri, 2009-10-16 at 10:59 -0500, David Teigland wrote: On Fri, Oct 16, 2009 at 03:56:05PM +0100, Steven Whitehouse wrote: Hi, On Wed, 2009-10-14 at 12:53 -0500, David Teigland wrote: On Wed, Oct 14, 2009

[Cluster-devel] Re: fence-agents: master - fencing: New option '--missing-as-off' to return OFF is machine is missing

2009-10-14 Thread David Teigland
fencing: New option '--missing-as-off' to return OFF is machine is missing If a blade is not present (i.e. removed for maintenance), the fence_bladecenter cannot check the state as it is reported empty. Resolves: bz#248006 --- a/fence/agents/bladecenter/fence_bladecenter.py +++

[Cluster-devel] Re: gfs2-utils: master - gfs_controld: Remove three unused functions

2009-10-14 Thread David Teigland
On Wed, Oct 14, 2009 at 02:55:04PM +, Steven Whitehouse wrote: gfs_controld: Remove three unused functions These functions are not called from anywhere and appear to be left over from earlier times. They were just added, but in translating the dlm_controld patch to gfs_controld I missed

Re: [Cluster-devel] Re: [PATCH 1/2] dlm: Send lockspace name with uevents

2009-10-13 Thread David Teigland
On Tue, Oct 13, 2009 at 09:53:37AM -0500, David Teigland wrote: On Tue, Oct 13, 2009 at 03:56:15PM +0100, Steven Whitehouse wrote: Although it is possible to get this information from the path, its much easier to provide the lockspace as a seperate env variable. I don't mind

<    1   2   3   4   >