Re: [Cluster-devel] gfs2: skip dlm_unlock calls in unmount

2012-11-08 Thread David Teigland
On Thu, Nov 08, 2012 at 10:26:53AM +, Steven Whitehouse wrote: > Hi, > > On Wed, 2012-11-07 at 14:14 -0500, David Teigland wrote: > > When unmounting, gfs2 does a full dlm_unlock operation on every > > cached lock. This can create a very large amount of work and can &g

[Cluster-devel] gfs2: skip dlm_unlock calls in unmount

2012-11-07 Thread David Teigland
ck is called because it may update the lvb of the resource. Signed-off-by: David Teigland --- fs/gfs2/glock.c|1 + fs/gfs2/incore.h |1 + fs/gfs2/lock_dlm.c |6 ++ 3 files changed, 8 insertions(+) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index e6c2fd5..f3a5edb 100644 ---

Re: [Cluster-devel] [PATCH] dlm_stonith_{off, reboot} aliases for fence helper

2012-11-05 Thread David Teigland
On Mon, Nov 05, 2012 at 07:05:22PM +0100, Jacek Konieczny wrote: > - rv = stonith_api_kick_helper(nodeid, 300, 1); > + rv = stonith_api_kick_helper(nodeid, 300, turn_off); I'd like it to be "reboot", but seeing the arg as "bool off" I figured the opposite would be "on" ... if you're saying

Re: [Cluster-devel] cluster4 dlm: startup notification for systemd

2012-11-05 Thread David Teigland
On Sat, Nov 03, 2012 at 04:27:54PM +0100, Jacek Konieczny wrote: > Hello, > > The two patches: > >[PATCH 1/2] --foreground option added to dlm_controld >[PATCH 2/2] Startup notification by sd_notify() > > add startup notification for the systemd service unit. This way startup > of servic

Re: [Cluster-devel] cluster4 dlm dlm_stonith ??? should it really fence by turning node off?

2012-11-05 Thread David Teigland
On Sat, Nov 03, 2012 at 03:58:28PM +0100, Jacek Konieczny wrote: > Hello, > > The dlm_stonith fencing helper is really convenient when Pacemaker is in > use. Though, it doesn't quite work as I would expect ??? when fencing > is needed it requests a node to be turned off instead of rebooting. And

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread David Teigland
On Wed, Oct 03, 2012 at 04:55:55PM +, Dietmar Maurer wrote: > > The difficult cases, which I think you're seeing, are partitions where > > no group has quorum, e.g. 2/2. In this case we do nothing, and the > > user has to resolve it by resetting some of the nodes > > The problem with that is

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread David Teigland
On Wed, Oct 03, 2012 at 04:26:35PM +, Dietmar Maurer wrote: > > I guess you're talking about the dlm_tool ls output? > > Yes. > > > The "fencing" there > > means it is waiting for fenced to finish fencing before it starts dlm > > recovery. > > fenced waits for quorum. > > So who actually s

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread David Teigland
On Wed, Oct 03, 2012 at 04:12:10PM +, Dietmar Maurer wrote: > > Yes, it's a stateful partition merge, and I think /var/log/messages should > > have > > mentioned something about that. When a node is partitioned from the > > others (e.g. network disconnected), it has to be cleanly reset before

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread David Teigland
On Wed, Oct 03, 2012 at 09:25:08AM +, Dietmar Maurer wrote: > So the observed behavior is expected? Yes, it's a stateful partition merge, and I think /var/log/messages should have mentioned something about that. When a node is partitioned from the others (e.g. network disconnected), it has t

Re: [Cluster-devel] [PATCH] dlm: check the maximum size of a request from user

2012-09-10 Thread David Teigland
On Sun, Sep 09, 2012 at 04:16:58PM +0200, Sasha Levin wrote: > device_write only checks whether the request size is big enough, but it > doesn't > check if the size is too big. > > At that point, it also tries to allocate as much memory as the user has > requested > even if it's too much. This c

Re: [Cluster-devel] tasks of dlm_recoverd?

2012-08-27 Thread David Teigland
On Mon, Aug 27, 2012 at 01:43:22PM +0200, Heiko Nardmann wrote: > Hi together! > > During the shutdown of my second cluster node (two node cluster) I > have seen a process 'dlm_recoverd' running with 100% CPU usage for > about 6 minutes. > > It's just that I have no idea what is the task of this

Re: [Cluster-devel] [PATCH] cman init: allow dlm hash table sizes to be tunable at startup

2012-07-25 Thread David Teigland
On Wed, Jul 25, 2012 at 07:32:28AM +0200, Fabio M. Di Nitto wrote: > From: "Fabio M. Di Nitto" > > Resolves: rhbz#842370 looks good, thanks > +# DLM_LKBTBL_SIZE - DLM_RSBTBL_SIZE - DLM_DIRTBL_SIZE > +# Allow tuning of DLM kernel hash table sizes. > +# do NOT change unless instructed to do so.

Re: [Cluster-devel] [patch] dlm: remove stray unlock

2012-05-21 Thread David Teigland
On Mon, May 21, 2012 at 05:35:26PM +0300, Dan Carpenter wrote: > Smatch complains that we unlock this twice. It looks like an accidental > to me. Thanks, will fix that.

Re: [Cluster-devel] [patch] dlm: NULL dereference on failure in kmem_cache_create()

2012-05-15 Thread David Teigland
On Tue, May 15, 2012 at 11:58:12AM +0300, Dan Carpenter wrote: > We aren't allowed to pass NULL pointers to kmem_cache_destroy() so if > both allocations fail, it leads to a NULL dereference. thanks, added that to next branch.

Re: [Cluster-devel] GPF in dlm_lowcomms_stop

2012-05-04 Thread David Teigland
On Fri, May 04, 2012 at 11:33:17AM -0600, dann frazier wrote: > On Fri, Mar 30, 2012 at 11:17:56AM -0600, dann frazier wrote: > > On Fri, Mar 30, 2012 at 12:42:40PM -0400, David Teigland wrote: > > > On Fri, Mar 30, 2012 at 11:42:56AM -0400, David Teigland wrote: > >

Re: [Cluster-devel] [GFS2 PATCH] GFS2: Instruct DLM to avoid queue convert slowdowns

2012-04-10 Thread David Teigland
On Tue, Apr 10, 2012 at 10:12:28AM +0100, Steven Whitehouse wrote: > Hi, > > On Thu, 2012-04-05 at 12:11 -0400, Bob Peterson wrote: > > Hi, > > > > Here's another patch (explanation below). This patch replies upon > > a DLM patch that hasn't fully gone upstream yet, so perhaps it > > shouldn't be

Re: [Cluster-devel] GPF in dlm_lowcomms_stop

2012-03-30 Thread David Teigland
On Fri, Mar 30, 2012 at 11:42:56AM -0400, David Teigland wrote: > Hi Dan, I'm not very familiar with this code either, but I've talked with > Chrissie and she suggested we try something like this: A second version that addresses a potentially similar problem in start. dif

Re: [Cluster-devel] GPF in dlm_lowcomms_stop

2012-03-30 Thread David Teigland
On Wed, Mar 21, 2012 at 07:59:13PM -0600, dann frazier wrote: > However... we've dropped the connections_lock, so its possible that a > new connection gets created on line 9. This connection structure would > have pointers to the workqueues that we're about to destroy. Sometime > later on we get da

Re: [Cluster-devel] GFS2: Pre-pull patch posting (merge window)

2012-03-23 Thread David Teigland
On Fri, Mar 23, 2012 at 01:06:05PM -0700, Randy Dunlap wrote: > >> GFS2_FS selects DLM (if GFS2_FS_LOCKING_DLM, which is enabled). > >> GFS2_FS selects IP_SCTP if DLM_SCTP, which is not enabled and not > >> used anywhere else in the kernel tree AFAICT. > >> DLM just always selects IP_SCTP. > > > >

Re: [Cluster-devel] GFS2: Pre-pull patch posting (merge window)

2012-03-23 Thread David Teigland
> on i386: > > ERROR: "sctp_do_peeloff" [fs/dlm/dlm.ko] undefined! > > > GFS2_FS selects DLM (if GFS2_FS_LOCKING_DLM, which is enabled). > GFS2_FS selects IP_SCTP if DLM_SCTP, which is not enabled and not > used anywhere else in the kernel tree AFAICT. > DLM just always selects IP_SCTP. Here's wh

Re: [Cluster-devel] last element of dlm_local_addr[] never used?

2012-03-21 Thread David Teigland
On Wed, Mar 21, 2012 at 12:24:35PM +0300, Dan Carpenter wrote: > In fs/dlm/lowcomms.c we declare the dlm_local_addr[] array like > this: > static struct sockaddr_storage *dlm_local_addr[DLM_MAX_ADDR_COUNT]; > > But it looks like the last element of the array is never used: > > 1072 /* Get loca

Re: [Cluster-devel] [PATCH] fs/dlm/rcom.c: included member.h twice

2012-02-16 Thread David Teigland
On Thu, Feb 16, 2012 at 02:55:21PM +0100, Danny Kukawka wrote: > fs/dlm/rcom.c included 'member.h' twice, remove the duplicate. I'll fold this into the current patch I'm working on. > > Signed-off-by: Danny Kukawka > --- > fs/dlm/rcom.c |1 - > 1 files changed, 0 insertions(+), 1 deletions

[Cluster-devel] gfs2: dlm based recovery coordination

2012-01-09 Thread David Teigland
ink the current merge cycle would be good, but you can send it off whenever you think is right. Dave >From 0fb2d7726b570c6a5eb289bac237fb384b9c6f0b Mon Sep 17 00:00:00 2001 From: David Teigland Date: Tue, 20 Dec 2011 17:03:04 -0600 Subject: [PATCH] gfs2: dlm based recovery coordination

[Cluster-devel] gfs2: fail mount if journal recovery fails

2012-01-09 Thread David Teigland
If the first mounter fails to recover one of the journals during mount, the mount should fail. Signed-off-by: David Teigland --- fs/gfs2/incore.h |1 + fs/gfs2/recovery.c |3 ++- 2 files changed, 3 insertions(+), 1 deletions(-) diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index

[Cluster-devel] gfs2: let spectator mount do read only recovery

2012-01-09 Thread David Teigland
only mount on a read only block device. Signed-off-by: David Teigland --- fs/gfs2/incore.h |1 + fs/gfs2/ops_fstype.c |2 +- fs/gfs2/recovery.c |4 +++- 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index 9182a87..59114c5 10064

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2012-01-09 Thread David Teigland
On Mon, Jan 09, 2012 at 11:46:26AM -0500, David Teigland wrote: > On Mon, Jan 09, 2012 at 04:36:30PM +, Steven Whitehouse wrote: > > On Thu, 2012-01-05 at 10:46 -0600, David Teigland wrote: > > > This new method of managing recovery is an alternative to > > > the pre

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2012-01-09 Thread David Teigland
On Mon, Jan 09, 2012 at 04:36:30PM +, Steven Whitehouse wrote: > On Thu, 2012-01-05 at 10:46 -0600, David Teigland wrote: > > This new method of managing recovery is an alternative to > > the previous approach of using the userland gfs_controld. > > > > - use

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2012-01-05 Thread David Teigland
On Thu, Jan 05, 2012 at 04:58:22PM +, Steven Whitehouse wrote: > > + clear_bit(SDF_NOJOURNALID, &sdp->sd_flags); > > + smp_mb__after_clear_bit(); > > + wake_up_bit(&sdp->sd_flags, SDF_NOJOURNALID); > > + ls->ls_first = !!test_bit(DFL_FIRST_MOUNT, &ls->ls_recover_flags); > > + return 0

[Cluster-devel] [PATCH 4/5] dlm: add recovery callbacks

2012-01-05 Thread David Teigland
their kernel counterparts. These callbacks allow the same coordination directly, and more simply. Signed-off-by: David Teigland --- fs/dlm/config.c | 130 ++-- fs/dlm/config.h | 17 +++- fs/dlm/dlm_internal.h | 21 ++ fs/dlm/lockspace.c| 43

[Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2012-01-05 Thread David Teigland
ck to track journals that need recovery Signed-off-by: David Teigland --- fs/gfs2/glock.c |2 +- fs/gfs2/glock.h |7 +- fs/gfs2/incore.h| 58 +++- fs/gfs2/lock_dlm.c | 993 ++- fs/gfs2/m

[Cluster-devel] [PATCH 1/5] dlm: convert rsb list to rb_tree

2012-01-05 Thread David Teigland
From: Bob Peterson Change the linked lists to rb_tree's in the rsb hash table to speed up searches. Slow rsb searches were having a large impact on gfs2 performance due to the large number of dlm locks gfs2 uses. Signed-off-by: Bob Peterson Signed-off-by: David Teigland --- f

[Cluster-devel] [PATCH 0/5] dlm and gfs2 patches for 3.3

2012-01-05 Thread David Teigland
userland. This new feature is not used by current dlm_controld and gfs_controld daemons, but will be enabled by a new dlm_controld version under development. Bob Peterson (1): dlm: convert rsb list to rb_tree David Teigland (4): dlm: move recovery barrier calls dlm: add node slots

[Cluster-devel] [PATCH 3/5] dlm: add node slots and generation

2012-01-05 Thread David Teigland
slot number. A new generation number is also added to a lockspace. It is set and incremented during each recovery along with the slot collection/assignment. The slot numbers will be passed to gfs2 which will use them as journal id's. Signed-off-by: David Teigland --- fs/dlm/dlm_internal.h |

[Cluster-devel] [PATCH 2/5] dlm: move recovery barrier calls

2012-01-05 Thread David Teigland
Put all the calls to recovery barriers in the same function to clarify where they each happen. Should not change any behavior. Also modify some recovery debug lines to make them consistent. Signed-off-by: David Teigland --- fs/dlm/dir.c |1 - fs/dlm/member.c |7 +-- fs/dlm

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2012-01-05 Thread David Teigland
On Thu, Jan 05, 2012 at 03:40:09PM +, Steven Whitehouse wrote: > I think it would be a good plan to not send this last patch for the > current merge window and let it settle for a bit longer. Running things > so fine with the timing makes me nervous bearing in mind the number of > changes, To

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2012-01-05 Thread David Teigland
t; | - use dlm recovery callbacks to initiate journal recovery > | - use a dlm lock to determine the first node to mount fs > | - use a dlm lock to track journals that need recovery > | > | Signed-off-by: David Teigland > | --- > | --- a/fs/gfs2/lock_dlm.c > | +++ b/fs/gfs2/lock

Re: [Cluster-devel] [patch] gfs2: make some sizes unsigned in set_recover_size()

2012-01-04 Thread David Teigland
> [patch] dlm: le32 vs le16 > gfs2: make some sizes unsigned in set_recover_size() Thanks, I've folded in both of those. Dave

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2011-12-22 Thread David Teigland
On Mon, Dec 19, 2011 at 12:47:38PM -0500, David Teigland wrote: > On Mon, Dec 19, 2011 at 01:07:38PM +, Steven Whitehouse wrote: > > > struct lm_lockstruct { > > > int ls_jid; > > > unsigned int ls_first; > > > - unsigned int ls_first_done; &

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2011-12-21 Thread David Teigland
On Wed, Dec 21, 2011 at 10:45:21AM +, Steven Whitehouse wrote: > I don't think I understand whats going on in that case. What I thought > should be happening was this: > > - Try to get mounter lock in EX >- If successful, then we are the first mounter so recover all > journals >-

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2011-12-20 Thread David Teigland
On Tue, Dec 20, 2011 at 02:16:43PM -0500, David Teigland wrote: > On Tue, Dec 20, 2011 at 10:39:08AM +, Steven Whitehouse wrote: > > > I dislike arbitrary delays also, so I'm hesitant to add them. > > > The choices here are: > > > - removing NOQUEUE from t

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2011-12-20 Thread David Teigland
On Tue, Dec 20, 2011 at 10:39:08AM +, Steven Whitehouse wrote: > > I dislike arbitrary delays also, so I'm hesitant to add them. > > The choices here are: > > - removing NOQUEUE from the requests below, but with NOQUEUE you have a > > much better chance of killing a mount command, which is a

Re: [Cluster-devel] [PATCH 4/5] dlm: add recovery callbacks

2011-12-19 Thread David Teigland
On Mon, Dec 19, 2011 at 12:36:57PM +, Steven Whitehouse wrote: > > + struct dlm_lockspace_ops ls_ops; > ^^ I'd suggest just keeping a pointer to > this, see below. > > +static int new_lockspace(const char *name, const char *cluster, uint32_t > > flags, > > +

Re: [Cluster-devel] [PATCH 3/5] dlm: add node slots and generation

2011-12-19 Thread David Teigland
> Nit, but this should have some spaces, iow, "i + 1;" > -error = check_config(ls, rc, nodeid); > +error = check_rcom_config(ls, rc, nodeid); yeah, I'll change those, thanks

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2011-12-19 Thread David Teigland
On Mon, Dec 19, 2011 at 01:07:38PM +, Steven Whitehouse wrote: > > struct lm_lockstruct { > > int ls_jid; > > unsigned int ls_first; > > - unsigned int ls_first_done; > > unsigned int ls_nodir; > Since ls_flags and ls_first also also only boolean flags, they could > potentially b

[Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2011-12-16 Thread David Teigland
ck to track journals that need recovery Signed-off-by: David Teigland --- fs/gfs2/glock.c |2 +- fs/gfs2/glock.h |7 +- fs/gfs2/incore.h| 51 ++- fs/gfs2/lock_dlm.c | 979 ++- fs/gfs2/m

[Cluster-devel] [PATCH 4/5] dlm: add recovery callbacks

2011-12-16 Thread David Teigland
their kernel counterparts. These callbacks allow the same coordination directly, and more simply. Signed-off-by: David Teigland --- fs/dlm/config.c | 130 ++-- fs/dlm/config.h | 17 +++- fs/dlm/dlm_internal.h | 20 ++ fs/dlm/lockspace.c| 37

[Cluster-devel] [PATCH 2/5] dlm: move recovery barrier calls

2011-12-16 Thread David Teigland
Put all the calls to recovery barriers in the same function to clarify where they each happen. Should not change any behavior. Also modify some recovery debug lines to make them consistent. Signed-off-by: David Teigland --- fs/dlm/dir.c |1 - fs/dlm/member.c |7 +-- fs/dlm

[Cluster-devel] [PATCH 1/5] dlm: convert rsb list to rb_tree

2011-12-16 Thread David Teigland
From: Bob Peterson Change the linked lists to rb_tree's in the rsb hash table to speed up searches. Slow rsb searches were having a large impact on gfs2 performance due to the large number of dlm locks gfs2 uses. Signed-off-by: Bob Peterson Signed-off-by: David Teigland --- f

[Cluster-devel] [PATCH 3/5] dlm: add node slots and generation

2011-12-16 Thread David Teigland
slot number. A new generation number is also added to a lockspace. It is set and incremented during each recovery along with the slot collection/assignment. The slot numbers will be passed to gfs2 which will use them as journal id's. Signed-off-by: David Teigland --- fs/dlm/dlm_internal.h |

[Cluster-devel] dlm patches

2011-12-16 Thread David Teigland
This is the current series of dlm patches from https://github.com/teigland/linux-dlm/tree/devel9 The first is already pushed to linux-next for the next merge cycle. The others, which allow gfs2 to be used without gfs_controld, are still being tested, and may be ready for the next merge cycle, depe

Re: [Cluster-devel] GFS2: glock statistics gathering (RFC)

2011-11-04 Thread David Teigland
On Fri, Nov 04, 2011 at 04:57:31PM +, Steven Whitehouse wrote: > Hi, > > On Fri, 2011-11-04 at 12:31 -0400, David Teigland wrote: > > On Fri, Nov 04, 2011 at 03:19:49PM +, Steven Whitehouse wrote: > > > The three pairs of mean/variance measure the following >

Re: [Cluster-devel] GFS2: glock statistics gathering (RFC)

2011-11-04 Thread David Teigland
On Fri, Nov 04, 2011 at 03:19:49PM +, Steven Whitehouse wrote: > The three pairs of mean/variance measure the following > things: > > 1. DLM lock time (non-blocking requests) You don't need to track and save this value, because all results will be one of three values which can gather once:

Re: [Cluster-devel] [Upstream patch] DLM: Convert rsb data from linked list to rb_tree

2011-10-25 Thread David Teigland
Hi Bob, I've made a few minor/cosmetic changes and attached my current version (not tested yet). > static int shrink_bucket(struct dlm_ls *ls, int b) > { > + struct rb_node *n = NULL; > struct dlm_rsb *r; > int count = 0, found; > > for (;;) { > found = 0; >

Re: [Cluster-devel] cluster4 gfs_controld

2011-10-13 Thread David Teigland
On Thu, Oct 13, 2011 at 05:16:29PM +0100, Steven Whitehouse wrote: > Hi, > > On Thu, 2011-10-13 at 11:30 -0400, David Teigland wrote: > > On Thu, Oct 13, 2011 at 03:41:31PM +0100, Steven Whitehouse wrote: > > > > cluster4 > > > > . jid from dlm-kernel &

Re: [Cluster-devel] cluster4 gfs_controld

2011-10-13 Thread David Teigland
On Thu, Oct 13, 2011 at 03:41:31PM +0100, Steven Whitehouse wrote: > > cluster4 > > . jid from dlm-kernel "slots" which will be assigned similarly > What is the actual algorithm used to assign these slots? The same as picking jids: lowest unused id starting with 0. As for implementation, I'll add

Re: [Cluster-devel] cluster4 gfs_controld

2011-10-13 Thread David Teigland
On Fri, Oct 14, 2011 at 12:02:27AM +0900, Masatake YAMATO wrote: > Just a question. > I'm happy if you give me a hint. > > > ... > > cluster3 dlm/gfs recovery > > . dlm_controld sees nodedown (libcpg) > > . gfs_controld sees nodedown (libcpg) > > . dlm_con

[Cluster-devel] cluster4 gfs_controld

2011-10-13 Thread David Teigland
Here's the outline of my plan to remove/replace the essential bits of gfs_controld in cluster4. I expect it'll go away entirely, but there could be one or two minor things it would still handle on the side. kernel dlm/gfs2 will continue to be operable with either . cluster3 dlm_controld/gfs_contr

Re: [Cluster-devel] [Upstream patch] DLM: Convert rsb data from linked list to rb_tree

2011-10-10 Thread David Teigland
On Mon, Oct 10, 2011 at 08:00:07PM +0100, Steven Whitehouse wrote: > > The fact remains that caching "as much as possible" tends to be harmful, > > and some careful limiting would be a good investment. > > > There is a limit. The point is that the limit is dynamic and depends on > memory pressure.

Re: [Cluster-devel] [Upstream patch] DLM: Convert rsb data from linked list to rb_tree

2011-10-10 Thread David Teigland
On Mon, Oct 10, 2011 at 04:51:01PM +0100, Steven Whitehouse wrote: > Hi, > > On Mon, 2011-10-10 at 10:43 -0400, David Teigland wrote: > > On Sat, Oct 08, 2011 at 06:13:52AM -0400, Bob Peterson wrote: > > > - Original Message - > > > | On Wed, Oct 05, 2011

Re: [Cluster-devel] [coverity] liblogthread

2011-10-10 Thread David Teigland
On Mon, Oct 10, 2011 at 10:45:17AM +0200, Fabio M. Di Nitto wrote: > This is the first patchset to address some issues spotted by Coverity scan. look fine

Re: [Cluster-devel] [Upstream patch] DLM: Convert rsb data from linked list to rb_tree

2011-10-10 Thread David Teigland
On Sat, Oct 08, 2011 at 06:13:52AM -0400, Bob Peterson wrote: > - Original Message - > | On Wed, Oct 05, 2011 at 03:25:39PM -0400, Bob Peterson wrote: > | > Hi, > | > > | > This upstream patch changes the way DLM keeps track of RSBs. > | > Before, they were in a linked list off a hash tabl

Re: [Cluster-devel] dlm: master - add license/copyright headers

2011-10-06 Thread David Teigland
On Thu, Oct 06, 2011 at 08:02:10PM +0200, Fabio M. Di Nitto wrote: > Hi David, > > this is going to need another quick pass. > > The libdlm headers are fine, but for the daemon/tool, we had GPLv2+ in > STABLE31 and current header only reflects GPLv2. I'm defaulting to plain v2 unless there's a r

Re: [Cluster-devel] [Upstream patch] DLM: Convert rsb data from linked list to rb_tree

2011-10-05 Thread David Teigland
On Wed, Oct 05, 2011 at 03:25:39PM -0400, Bob Peterson wrote: > Hi, > > This upstream patch changes the way DLM keeps track of RSBs. > Before, they were in a linked list off a hash table. Now, > they're an rb_tree off the same hash table. This speeds up > DLM lookups greatly. > > Today's DLM is

Re: [Cluster-devel] dlm: master - dlm: clear out old stuff and build system

2011-10-03 Thread David Teigland
> dlm/libdlm/libdlm.pc.in | 11 - > dlm/libdlm/libdlm_lt.pc.in| 11 - > > dropping the .pc file is going to break dlm users. > > pc files are used by different build systems (not just > autotools/autoconf) to detect libdlm and link against it correctly. > > Similar

Re: [Cluster-devel] dlm: master - dlm_controld: remove ccs and new Makefile

2011-09-30 Thread David Teigland
On Fri, Sep 30, 2011 at 01:07:01PM +0200, Fabio M. Di Nitto wrote: > On 09/30/2011 12:02 AM, David Teigland wrote: > > > add a normal, sane Makefile > > If you plan to drop autoconf+autotool, that is your call (I disagree for > several reasons, but it's your project a

[Cluster-devel] [PATCH 3/3] gfs_controld: full check for member changes

2011-09-28 Thread David Teigland
the incarnation numbers of members from consecutive queries to avoid this. bz 663397 Signed-off-by: David Teigland --- group/gfs_controld/member_cman.c | 51 +++--- 1 files changed, 47 insertions(+), 4 deletions(-) diff --git a/group/gfs_controld/member_cman.c

[Cluster-devel] [PATCH 1/3] fenced: full check for member changes

2011-09-28 Thread David Teigland
the incarnation numbers of members from consecutive queries to avoid this. bz 663397 Signed-off-by: David Teigland --- fence/fenced/member_cman.c | 36 1 files changed, 36 insertions(+), 0 deletions(-) diff --git a/fence/fenced/member_cman.c b/fence

[Cluster-devel] [PATCH 2/3] dlm_controld: full check for member changes

2011-09-28 Thread David Teigland
the incarnation numbers of members from consecutive queries to avoid this. bz 663397 Signed-off-by: David Teigland --- group/dlm_controld/member_cman.c | 79 -- 1 files changed, 75 insertions(+), 4 deletions(-) diff --git a/group/dlm_controld/member_cman.c

Re: [Cluster-devel] dlm: master - dlm_controld: new plock state transfer

2011-09-26 Thread David Teigland
On Sat, Sep 24, 2011 at 07:13:34AM +0200, Fabio M. Di Nitto wrote: > Quick question.. deadlock.c/netlink.c have been dropped from the build > and not referenced anywhere for distribution. Is it a plan to kill them > completely or do they need porting? I'm going to leave the files there as artifact

Re: [Cluster-devel] DLM + SCTP bug (was Re: [DRBD-user] kernel panic with DRBD: solved)

2011-09-12 Thread David Teigland
> >> When node A starts back up, the SCTP protocol notices this (as it?s > >> supposed to), and delivers an SCTP_ASSOC_CHANGE / SCTP_RESTART > >> notification to the SCTP socket, telling the socket owner (the dlm_recv > >> thread) that the other node has restarted. DLM responds by telling SCTP > >

Re: [Cluster-devel] [PATCH 4/5] gfs_controld: Remove dead code from loop()

2011-09-06 Thread David Teigland
On Tue, Sep 06, 2011 at 01:00:16PM +0100, Andrew Price wrote: > This patch removes an if statement where the true branch is never taken. > At this point in the code, poll_timeout could only be 500 or -1. > > Signed-off-by: Andrew Price > --- > group/gfs_controld/main.c |3 --- > 1 files chan

[Cluster-devel] [PATCH] dlm_controld: fix plock dev_write no op

2011-08-18 Thread David Teigland
. This is because the kernel generates extraneous plock unlock requests when files are closed with flocks. Because dlm_controld finds no plocks on the files, it replies to the kernel with an error, rather than skipping the reply to do CLOSE. bz 731775 Signed-off-by: David Teigland --- group/dl

Re: [Cluster-devel] [PATCH v3] fs, dlm: don't do pointless NULL check, use kzalloc and fix order of arguments

2011-07-11 Thread David Teigland
On Sun, Jul 10, 2011 at 10:54:31PM +0200, Jesper Juhl wrote: > In fs/dlm/lock.c in the dlm_scan_waiters() function there are 3 small > issues: > > 1) There's no need to test the return value of the allocation and do a > memset if is succeedes. Just use kzalloc() to obtain zeroed memory. > > 2) Si

Re: [Cluster-devel] [RFT] dlm: replace lkb hash table with idr

2011-07-08 Thread David Teigland
On Wed, Jul 06, 2011 at 12:14:26PM -0400, David Teigland wrote: > Request for testing > > I'm looking at possible improvements to the dlm hash tables. I've pushed this and another patch related to hash table performance to the tmp-testing branch, git://git.kernel.org/pub/s

[Cluster-devel] [RFT] dlm: replace lkb hash table with idr

2011-07-06 Thread David Teigland
Request for testing I'm looking at possible improvements to the dlm hash tables. This patch keeps lkbs in an idr instead of a hash table. Before pushing this patch further, I'd like to know if it makes any difference in environments using millions of locks on each node. From: Davi

Re: [Cluster-devel] [PATCH] fs, dlm: Don't leak, don't do pointless NULL checks and use kzalloc

2011-06-29 Thread David Teigland
On Wed, Jun 29, 2011 at 11:51:00PM +0200, Jesper Juhl wrote: > > I don't think so; num_nodes won't be set to zero. > > Hmm. How so? Maybe I'm missing something obvious, but; > num_nodes is initialized to zero at the beginning of the function, which > means that we'll definately do the first allo

Re: [Cluster-devel] [PATCH] fs, dlm: Don't leak, don't do pointless NULL checks and use kzalloc

2011-06-29 Thread David Teigland
On Wed, Jun 29, 2011 at 11:09:27PM +0200, Jesper Juhl wrote: > In fs/dlm/lock.c in the dlm_scan_waiters() function there are 3 small > issues: > > 1) first time through the loop we allocate memory for 'warned', if we > then (in the loop) don't take the "if (!warned)" path and loop again, > the sec

Re: [Cluster-devel] [PATCH 0/3] dlm_controld: Improvement on plock searching efficiency

2011-05-27 Thread David Teigland
On Fri, May 27, 2011 at 07:44:03AM +0800, Jiaju Zhang wrote: > This series introduces a RB tree for improving plock resources searching > efficiency. We met this performance issue when running Samba on top of > cluster filesystem, profiling during nbench runs with num-progs=500, the > dlm_controld

Re: [Cluster-devel] [PATCH 26/34] dlm: Drop __TIME__ usage

2011-05-25 Thread David Teigland
> > Cc: Christine Caulfield > > Cc: David Teigland > > Cc: cluster-devel@redhat.com > > Signed-off-by: Michal Marek > > --- > > fs/dlm/main.c |2 +- > > 1 files changed, 1 insertions(+), 1 deletions(-) > > Hi, > > I don't see this

[Cluster-devel] dlm_controld: clear waiting plocks for closed files

2011-05-23 Thread David Teigland
when the process is killed. So the unlock-close also needs to clear any waiting plocks that were abandoned by the killed process. The corresponding kernel patch: https://lkml.org/lkml/2011/5/23/237 Signed-off-by: David Teigland --- group/dlm_controld/plock.c | 28 1

Re: [Cluster-devel] [PATCH] dlm: Remove superfluous call to recalc_sigpending()

2011-03-28 Thread David Teigland
On Thu, Mar 24, 2011 at 01:56:47PM +, Matt Fleming wrote: > From: Matt Fleming > > recalc_sigpending() is called within sigprocmask(), so there is no > need call it again after sigprocmask() has returned. Thanks, pushed to dlm.git next. Dave

Re: [Cluster-devel] RFC: generic improvement to fence agents api

2011-03-21 Thread David Teigland
On Sat, Mar 19, 2011 at 07:34:55AM +0100, Fabio M. Di Nitto wrote: > My suggestion would be to allow to specify a list of ports instead. This comes up now and then. The current rule of one action per agent execution is a tried and true, fundamental property of the agent api. It should not be chan

[Cluster-devel] fenced: don't ignore victim_done messages for reduced victims

2011-02-22 Thread David Teigland
ded for the node, causing dlm_controld to wait indefinately for fencing to complete for the reduced victim. The fix is to simply record the information from a victim_done message even if the node is not in the victims list. bz 678704 Signed-off-by: David Teigland --- fence/fenced/cpg.c | 18 ++

Re: [Cluster-devel] [PATCH] dlm: Reset fs_notified when check_fs_done

2011-02-22 Thread David Teigland
On Tue, Feb 22, 2011 at 04:35:42PM +0800, Jiaju Zhang wrote: > On Tue, Nov 9, 2010 at 6:06 AM, David Teigland wrote: > > On Mon, Nov 08, 2010 at 11:05:49PM +0800, Jiaju Zhang wrote: > >> Luckily, things have changed now. One user met this issue two months > >> ago a

Re: [Cluster-devel] [PATCH] fenced: send dbus signal when node is fenced

2011-02-03 Thread David Teigland
On Thu, Feb 03, 2011 at 01:26:07PM -0600, Ryan O'Hara wrote: > This patch adds the ability to send a dbus signal when a node is fenced. > This code is can reestablish a connection with dbus if necessary. ACK

Re: [Cluster-devel] [PATCH] fenced: send dbus signal when node is fenced

2011-02-03 Thread David Teigland
> +void dbus_init (void) No space before ( Also, it would be a good idea to put a fenced-specific prefix before fenced's own dbus functions, e.g. fd_dbus_init(), because dbus_ is the dbus lib's namespace and open to symbol collisions. > +{ > +#ifdef DBUS > + > +if (!(bus = dbus_bus_get_priva

Re: [Cluster-devel] [PATCH] dlm: send_bast_queue() skip list loop not only sending basts to convertqueue

2011-01-04 Thread David Teigland
On Tue, Jan 04, 2011 at 06:06:51PM -0200, cmaiol...@redhat.com wrote: > The resource groups got corrupted without this patch: I could see an extraneous bast leading to confusion in gfs2 about the lock state, but gfs2 should probably be asserting somewhere before it actually corrupts anything... >

Re: [Cluster-devel] [PATCH] dlm: sanitize work_start() in lowcomms.c

2010-12-13 Thread David Teigland
On Tue, Dec 14, 2010 at 12:28:25AM +0900, Namhyung Kim wrote: > The create_workqueue() returns NULL if failed rather than ERR_PTR(). > Fix error checking and remove unnecessary variable 'error'. I adapted this to the alloc_workqueue patch in next and pushed to next. Dave

Re: [Cluster-devel] Patch: making DLM more robust

2010-12-01 Thread David Teigland
On Wed, Dec 01, 2010 at 10:23:25AM +0100, Menyhart Zoltan wrote: > If we cannot obtain a given resource within a limited time frame, > then it is a real error for the customer: s/he cannot mount an OCFS2 > volume, cannot issue a cluster command, etc. Matter of opinion and preference I suppose. >

Re: [Cluster-devel] Patch: making DLM more robust

2010-11-30 Thread David Teigland
On Tue, Nov 30, 2010 at 05:57:50PM +0100, Menyhart Zoltan wrote: > Hi, > > An easy first step to make DLM more robust can be adding a time out protection > to the lock space cration operation, while waiting for a "dlm_controld" > action. > A new memeber "ci_dlm_controld_secs" is added to "dlm_con

Re: [Cluster-devel] "->ls_in_recovery" not released

2010-11-24 Thread David Teigland
On Wed, Nov 24, 2010 at 05:13:40PM +0100, Menyhart Zoltan wrote: > Could you please indicate the exact URL? The current fedora packages, or https://www.redhat.com/archives/cluster-devel/2010-October/msg8.html or http://git.fedorahosted.org/git/?p=cluster.git;a=shortlog;h=refs/heads/STABLE31 >

Re: [Cluster-devel] "->ls_in_recovery" not released

2010-11-23 Thread David Teigland
On Tue, Nov 23, 2010 at 03:58:42PM +0100, Menyhart Zoltan wrote: > David Teigland wrote: > >On Mon, Nov 22, 2010 at 05:31:25PM +0100, Menyhart Zoltan wrote: > >>We have got a two-node OCFS2 file system controlled by the pacemaker. > > > >Are you using dlm_contro

Re: [Cluster-devel] "->ls_in_recovery" not released

2010-11-22 Thread David Teigland
On Mon, Nov 22, 2010 at 05:31:25PM +0100, Menyhart Zoltan wrote: > We have got a two-node OCFS2 file system controlled by the pacemaker. Are you using dlm_controld.pcmk? If so, please try the latest versions of pacemaker that use the standard dlm_controld. The problem may be related to the locks

Re: [Cluster-devel] dlm: Use cmwq for send and receive workqueues

2010-11-12 Thread David Teigland
On Fri, Nov 12, 2010 at 04:20:35PM +, Steven Whitehouse wrote: > Hi, > > On Fri, 2010-11-12 at 11:12 -0500, David Teigland wrote: > > On Fri, Nov 12, 2010 at 12:12:29PM +, Steven Whitehouse wrote: > > > > > > So far as I can tell, there is no reason to

Re: [Cluster-devel] [PATCH] dlm: Handle application limited situations properly.

2010-11-12 Thread David Teigland
On Wed, Nov 10, 2010 at 09:56:39PM -0800, David Miller wrote: > > In the normal regime where an application uses non-blocking I/O > writes on a socket, they will handle -EAGAIN and use poll() to > wait for send space. > > They don't actually sleep on the socket I/O write. > > But kernel level RP

Re: [Cluster-devel] dlm: Use cmwq for send and receive workqueues

2010-11-12 Thread David Teigland
On Fri, Nov 12, 2010 at 12:12:29PM +, Steven Whitehouse wrote: > > So far as I can tell, there is no reason to use a single-threaded > send workqueue for dlm, since it may need to send to several sockets > concurrently. Both workqueues are set to WQ_MEM_RECLAIM to avoid > any possible deadlock

Re: [Cluster-devel] [PATCH] dlm: Reset fs_notified when check_fs_done

2010-11-08 Thread David Teigland
On Mon, Nov 08, 2010 at 11:05:49PM +0800, Jiaju Zhang wrote: > Luckily, things have changed now. One user met this issue two months > ago and he's also very kindly to test the patch. The result is the > patch really works. > > Attached is the log before they apply the patch. This time the log > ha

Re: [Cluster-devel] dlm: Don't send callback to node making lock request when "try 1cb" fails

2010-09-03 Thread David Teigland
also speeds up GFS as well. In the GFS2 case the performance gain is over 10x for cases of write activity to an inode whose glock is cached on another, idle (wrt that glock) node. (comment added, dct) Signed-off-by: Steven Whitehouse Tested-by: Abhijith Das Signed-off-by: David Teigland --- fs/dl

Re: [Cluster-devel] [PATCH] dlm: use genl_register_family_with_ops()

2010-07-26 Thread David Teigland
On Mon, Jul 26, 2010 at 05:19:19PM +0800, Changli Gao wrote: > Signed-off-by: Changli Gao > > fs/dlm/netlink.c | 15 +-- > 1 file changed, 1 insertion(+), 14 deletions(-) > diff --git a/fs/dlm/netlink.c b/fs/dlm/netlink.c > index 2c6ad51..ef17e01 100644 > --- a/fs/dlm/netlink.c

Re: [Cluster-devel] GFS2: Wait for journal id on mount if not specified on mount command line

2010-06-08 Thread David Teigland
On Tue, Jun 08, 2010 at 09:34:52AM +0100, Steven Whitehouse wrote: > > A couple obvious questions from the start... > > - What if gfs_controld isn't running? > It will hang until mount is killed, where upon it will clean up and exit > gracefully. Right, so instead of failing with an error, it hang

<    1   2   3   4   5   >