Re: [Cluster-devel] [PATCH v2 1/3] dlm: check if workqueues are NULL before flushing/destroying

2019-04-05 Thread David Teigland
I tried these with one address and had no problem, so I've pushed them to the linux-dlm next branch. Dave

Re: [Cluster-devel] [GFS2 PATCH] gfs2: Panic when an io error occurs writing to the journal

2018-12-17 Thread David Teigland
On Mon, Dec 17, 2018 at 09:58:47AM -0500, Bob Peterson wrote: > Dave Teigland recommended. Unless I'm mistaken, Dave has said that GFS2 > should never withdraw; it should always just kernel panic (Dave, correct > me if I'm wrong). At least this patch confines that behavior to a small > subset of

Re: [Cluster-devel] lost idr_destroy for ls_recover_idr in release_lockspace() ?

2018-11-15 Thread David Teigland
On Thu, Nov 15, 2018 at 09:49:17AM +0300, Vasily Averin wrote: > Dear David, > I've noticed that release_lockspace() lacks idr_destroy(>ls_recover_idr), > though it is called on rollback in new_lockspace(). > > It seems for me it is not critical, and should not lead to any leaks, > however could

Re: [Cluster-devel] [PATCH 0/3] dlm: fix various incorrect behaviors

2018-11-07 Thread David Teigland
On Fri, Nov 02, 2018 at 02:18:19PM -0600, Tycho Andersen wrote: > Hi, > > here's a series to fix some bugs I noticed in the DLM. The third patch > in the series and maybe the first should probably go to stable, assuming > everyone agrees they're indeed bugs. > > Thanks, > > Tycho > > Tycho

Re: [Cluster-devel] How to enable daemon_debug for dlm_controld

2018-06-27 Thread David Teigland
On Wed, Jun 27, 2018 at 04:07:18PM +0800, Guoqing Jiang wrote: > But by default, seems dlm_controld just run with "-s 0". And I tried to add > "daemon_debug=1" to /etc/dlm/dlm.conf, > then dlm resource can't start at all. Could you tell me how to enable this > option? Thanks in advance! That

Re: [Cluster-devel] [PATCH] dlm: prompt the user SCTP is experimental

2018-04-02 Thread David Teigland
On Thu, Mar 22, 2018 at 10:27:56PM -0600, Gang He wrote: > Hello David, > > Do you agree to add this prompt to the user? > Since sometimes customers attempted to setup SCTP protocol with two rings, > but they could not get the expected result, then it maybe bring some concerns > to the

Re: [Cluster-devel] [ClusterLabs] DLM connection channel switch take too long time (> 5mins)

2018-03-08 Thread David Teigland
> I use active rrp_mode in corosync.conf and reboot the cluster to let the > configuration effective. > But, the about 5 mins hang in new_lockspace() function is still here. The last time I tested connection failures with sctp was several years ago, but I recall seeing similar problems. I had

Re: [Cluster-devel] [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup

2017-10-09 Thread David Teigland
On Sat, Oct 07, 2017 at 03:26:11AM +0100, Al Viro wrote: > On Sat, Oct 07, 2017 at 09:59:41AM +0800, Jia-Ju Bai wrote: > > According to fs/dlm/lock.c, the kernel may sleep under a spinlock, > > and the function call path is: > > dlm_master_lookup (acquire the spinlock) > >

Re: [Cluster-devel] [PATCH] dlm/recoverd: recheck kthread_should_stop() before schedule()

2017-09-25 Thread David Teigland
On Mon, Sep 25, 2017 at 03:47:50PM +0800, Guoqing Jiang wrote: > Call schedule() here could make the thread miss wake > up from kthread_stop(), so it is better to recheck > kthread_should_stop() before call schedule(), a symptom > happened when I run indefinite test (which mostly created >

Re: [Cluster-devel] [PATCH 00/18] [try #2] DLM: dlm patches need review

2017-09-18 Thread David Teigland
The patches are now here for testing https://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git/log/?h=next Dave

Re: [Cluster-devel] [PATCH 00/18] [try #2] DLM: dlm patches need review

2017-09-13 Thread David Teigland
On Wed, Sep 13, 2017 at 10:51:06AM +0100, Steven Whitehouse wrote: > Hi, > > > On 12/09/17 09:54, tsutomu@toshiba.co.jp wrote: > > Hi, > > > > This series of patches (2nd version after previous review on August) is to > > fix various bugs. This patch set is against the mainline kernel. > >

Re: [Cluster-devel] [PATCH 13/18] [try #2] DLM: fix conversion deadlock when DLM_LKF_NODLCKWT flag is set

2017-09-12 Thread David Teigland
On Tue, Sep 12, 2017 at 09:01:31AM +, tsutomu@toshiba.co.jp wrote: > When the DLM_LKF_NODLCKWT flag was set, even if conversion deadlock > was detected, the caller of can_be_granted() was unknown. > We change the behavior of can_be_granted() and change it to detect > conversion deadlock

Re: [Cluster-devel] [PATCH 10/17] dlm: use schedule_timeout instead of schedule in dlm_recoverd

2017-08-22 Thread David Teigland
On Thu, Aug 17, 2017 at 11:40:13PM +, tsutomu@toshiba.co.jp wrote: > If you refer to other implementations in kernel, the following > modifications may be better. > The important thing is to call kthread_should_stop() after > set_current_state(TASK_INTERRUPTIBLE). How is this fix?

Re: [Cluster-devel] [PATCH 08/17] dlm: retry rcom when dlm_wait_function is timed out.

2017-08-09 Thread David Teigland
On Wed, Aug 09, 2017 at 05:50:31AM +, tsutomu@toshiba.co.jp wrote: > If a node sends a DLM_RCOM_STATUS command and an error occurs on the > receiving side, the DLM_RCOM_STATUS_REPLY response may not be returned. > We retransmitted the DLM_RCOM_STATUS command so that we do not wait for > an

Re: [Cluster-devel] [PATCH 10/17] dlm: use schedule_timeout instead of schedule in dlm_recoverd

2017-08-09 Thread David Teigland
On Wed, Aug 09, 2017 at 05:51:01AM +, tsutomu@toshiba.co.jp wrote: > When dlm_recoverd_stop() is called between kthread_should_stop() and > set_task_state(), dlm_recoverd will not wake up. This works, but have you looked elsewhere in the kernel for kthread examples we could copy that do a

Re: [Cluster-devel] [PATCH] dlm: use sock_create_lite inside tcp_accept_from_sock

2017-08-07 Thread David Teigland
On Mon, Aug 07, 2017 at 02:31:20PM +0800, Guoqing Jiang wrote: > To resolve the issue, we need to use sock_create_lite > instead of sock_create_kern, like commit 0933a578cd55 > ("rds: tcp: use sock_create_lite() to create the accept > socket") did. Thanks, this is now in linux-dlm next. Dave

Re: [Cluster-devel] [PATCH v2] fs/dlm: Fix kernel memory disclosure

2017-02-22 Thread David Teigland
On Wed, Feb 22, 2017 at 03:45:34PM +0800, Vlad Tsyrklevich wrote: > Hello, I wanted to ping the list and see if this could get a review: now pushed to linux-dlm.git > > Clear the 'unused' field and the uninitialized padding in 'lksb' to > > avoid leaking memory to userland in

Re: [Cluster-devel] Question on LVB when the node that held EX lock crash

2016-11-30 Thread David Teigland
On Wed, Nov 30, 2016 at 05:07:22PM +0800, Eric Ren wrote: > @@ -852,12 +868,19 @@ void dlm_recover_rsbs(struct dlm_ls *ls) > if (is_master(r)) { > if (rsb_flag(r, RSB_RECOVER_CONVERT)) > recover_conversion(r); > + > +

Re: [Cluster-devel] Question on LVB when the node that held EX lock crash

2016-11-16 Thread David Teigland
On Wed, Nov 16, 2016 at 04:42:09PM +0800, Eric Ren wrote: > On 11/16/2016 04:29 PM, Eric Ren wrote: > > Hi David and all, > > > > I am debugging an issue of ocfs2 that relates to LVB value. I will try > > to make it a pure DLM question: > > > > Two nodes (N1, N2) try to truncate the same

Re: [Cluster-devel] [DLM PATCH] DLM: Don't specify WQ_UNBOUND for the ast callback workqueue

2016-10-19 Thread David Teigland
On Wed, Oct 19, 2016 at 11:34:54AM -0400, Bob Peterson wrote: > Hi, > > This patch removes the WQ_UNBOUND flag (which implies WQ_HIGHPRI) > from the DLM's ast work queue, in favor of just WQ_HIGHPRI. > This has been shown to cause a 19 percent performance increase for > simultaneous inode creates

Re: [Cluster-devel] About dlm_unlock (kernel space)

2016-06-13 Thread David Teigland
On Mon, Jun 13, 2016 at 07:15:09AM -0400, Guoqing Jiang wrote: > Hi, > > In case we have set DLM_LKF_CONVERT flag for dlm_lock, is it > possible that the convert > queue could be NULL or not NULL while perform unlock? I think there > are two different > cases would appear when call dlm_unlock: >

Re: [Cluster-devel] [DLM PATCH] dlm_controld: outputs explicit info about stateful

2016-05-18 Thread David Teigland
On Wed, May 18, 2016 at 02:53:00PM +0800, Eric Ren wrote: > Q1: what's stateful merged node? > Q2: what if we add the stateful merged nodes to dlm_controld daemon > cpg instead of fencing them? The details here are fundamental to the way dlm works because the dlm depends on the properties of

Re: [Cluster-devel] [DLM PATCH] dlm_controld: add option of enable_force_kick

2016-05-16 Thread David Teigland
On Mon, May 16, 2016 at 04:07:18PM +0800, Eric Ren wrote: > When there are 3 or more partitions that merge, none may see enough > clean nodes. Therefore, DLM would be stuck there forever unitl administrator > manually reset/restart enough nodes to produce sufficient clean nodes. > However,

Re: [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection

2016-05-16 Thread David Teigland
On Mon, May 16, 2016 at 03:44:27PM +0800, Eric Ren wrote: > Thanks! Hum, according to the long comments, you've handled the 2/2 > even split by way of the low nodeid killing statefull merged > numbers. Interesting, I'd forgotten about that bit of code, so I was wrong to say that we do nothing

Re: [Cluster-devel] inconsistent dlm_new_lockspace LVB_LEN size from ocfs2 user-space tool and ocfs2 kernel module

2016-05-13 Thread David Teigland
On Fri, May 13, 2016 at 02:36:25AM -0600, Gang He wrote: > Here is a inconsistent LVB_LEN size problem when create a new lockspace > from user-space tool (e.g. fsck.ocfs2) and kernel module (e.g. > ocfs2/stack_user.c). > From the userspace tool, the LVB size is DLM_USER_LVB_LEN (32 bytes, >

Re: [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection

2016-05-13 Thread David Teigland
On Fri, May 13, 2016 at 01:45:47PM +0800, Eric Ren wrote: > >the cluster. Neither option is good. In the past we decided to let the > >cluster sit in this state so an admin could choose which nodes to remove. > >Do you prefer the alternative of kicking nodes in this case (with somewhat >

Re: [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection

2016-05-12 Thread David Teigland
On Thu, May 12, 2016 at 05:16:08PM +0800, Eric Ren wrote: > DLM would be stuck in "need fencing" state, although cluster can > regain quorum very quickly after a network transient disconnection. > > It's possible that this process happens within one monoclock. It > means

Re: [Cluster-devel] [DLM PATCH 0/6] Misc DLM Improvements Regarding Socket Errors

2016-02-15 Thread David Teigland
On Mon, Feb 15, 2016 at 04:16:17PM -0500, Bob Peterson wrote: > I think the "right thing to do" at this point is this: > > 1. Patch #1 is already upstream > 2. Patch #2 stands on its own, so I think this should go forward. > 3. Combine patches 3, 4 and 5, which ought to provide a comprehensive

Re: [Cluster-devel] [DLM PATCH 0/6] Misc DLM Improvements Regarding Socket Errors

2016-02-11 Thread David Teigland
On Thu, Feb 11, 2016 at 01:39:09PM -0500, Bob Peterson wrote: > The problem is: While testing the dlm in multiple recovery situations, > Nate and I discovered multiple problems. Until recently, no one has tried > to run recovery tests on an upstream DLM, (Let's distinguish tcp connection

Re: [Cluster-devel] [DLM PATCH 0/6] Misc DLM Improvements Regarding Socket Errors

2016-02-11 Thread David Teigland
ob Peterson <rpete...@redhat.com> Signed-off-by: David Teigland <teigl...@redhat.com> Could we begin with one patch that's easy to track that directly resolves the issues with that commit (perhaps even a revert if it's not simple to fix directly)? That brings us back to a known-good

Re: [Cluster-devel] DLM Shutdown

2016-02-10 Thread David Teigland
On Wed, Feb 10, 2016 at 02:33:49AM +0100, Andreas Gruenbacher wrote: > never actively releases existing lockspaces. This means that as soon > as any application creates the default lockspace (via libdlm), or if > an application doesn't release any lockspaces it creates, dlm_controld > will never

Re: [Cluster-devel] DLM Shutdown

2016-02-10 Thread David Teigland
On Wed, Feb 10, 2016 at 09:38:58PM +0100, Andreas Gruenbacher wrote: > On Wed, Feb 10, 2016 at 9:18 PM, David Teigland <teigl...@redhat.com> wrote: > > On Wed, Feb 10, 2016 at 08:48:12PM +0100, Andreas Gruenbacher wrote: > >> When a shutdown is requested, shouldn't dlm

Re: [Cluster-devel] DLM Shutdown

2016-02-10 Thread David Teigland
On Wed, Feb 10, 2016 at 08:48:12PM +0100, Andreas Gruenbacher wrote: > When a shutdown is requested, shouldn't dlm_controld really release > lockspaces in a similar way as well? You could probably do that if you check that the lockspace is managing no local locks (which would be a pain). If

Re: [Cluster-devel] [GFS2 PATCH 1/2] GFS2: Make gfs2_clear_inode() queue the final put

2015-12-04 Thread David Teigland
On Fri, Dec 04, 2015 at 09:51:53AM -0500, Bob Peterson wrote: > it's from the fenced process, and if so, queue the final put. That should > mitigate the problem. Bob, I'm perplexed by the focus on fencing; this issue is broader than fencing as I mentioned in bz 1255872. Over the years that I've

Re: [Cluster-devel] [PATCH 17/23] dlm: use per-attribute show and store methods

2015-09-28 Thread David Teigland
On Fri, Sep 25, 2015 at 06:49:54AM -0700, Christoph Hellwig wrote: > Signed-off-by: Christoph Hellwig > --- > fs/dlm/config.c | 288 > +++- > 1 file changed, 74 insertions(+), 214 deletions(-) Looks good to me. Dave

Re: [Cluster-devel] PROBLEM: dlm: BUG_ON on con-nodeid == 0 when connect from unknown address

2015-08-03 Thread David Teigland
On Mon, Aug 03, 2015 at 07:20:55PM +0800, tan...@zte.com.cn wrote: When using SCTP protocol in dlm and it received connecting request from unknown address, the function receive_from_sock may directly shutdown the connection through process_sctp_notification. If still messages received from

Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-11 Thread David Teigland
On Thu, Jun 11, 2015 at 05:47:28PM +0800, Guoqing Jiang wrote: Do you consider take the following clean up? If yes, I will send a formal patch, otherwise pls ignore it. On first glance, the old and new code do not appear to do the same thing, so let's leave it as it is. - to_nodeid =

Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-10 Thread David Teigland
On Wed, Jun 10, 2015 at 11:10:44AM +0800, Guoqing Jiang wrote: The remove_from_waiters could only be invoked after failed to create_message, right? Since send_message always returns 0, this patch doesn't touch anything about the failure path, and it also doesn't change the original semantic.

Re: [Cluster-devel] Kernel crash at DLM: kernel BUG at /usr/src/packages/BUILD/dlm-1.6.fio/obj/default/lowcomms.c:715!

2015-01-29 Thread David Teigland
On Thu, Jan 29, 2015 at 03:50:58AM +, Pralay Dakua wrote: 645 static int receive_from_sock(struct connection *con) 646 { 704 705 /* Process SCTP notifications */ 706 if (msg.msg_flags MSG_NOTIFICATION) { 707 msg.msg_control = incmsg; 708

Re: [Cluster-devel] [RFA][PATCH 5/8] dlm: Remove seq_printf() return checks and use seq_has_overflowed()

2014-11-04 Thread David Teigland
On Tue, Nov 04, 2014 at 08:08:52AM -0500, Steven Rostedt wrote: On Wed, 29 Oct 2014 17:56:07 -0400 Steven Rostedt rost...@goodmis.org wrote: From: Joe Perches j...@perches.com [ REQUEST FOR ACKS ] Can any of the DLM maintainers give me an Acked-by for this? Looks ok, Dave

Re: [Cluster-devel] [DLM PATCH] DLM: Don't wait for resource library lookups if NOLOOKUP is specified

2014-10-01 Thread David Teigland
On Wed, Oct 01, 2014 at 01:21:41PM -0400, Bob Peterson wrote: Hi, This patch adds a new lock flag, DLM_LKF_NOLOOKUP, which instructs DLM to refrain from sending lookup requests in cases where the lock library node is not the current node. This is similar to the DLM_LKF_NOQUEUE flag, except

Re: [Cluster-devel] [RFC PATCH] dlm: Remove unused conf from lm_grant

2014-07-01 Thread David Teigland
On Tue, Jul 01, 2014 at 10:43:13AM -0400, Jeff Layton wrote: On Tue, 01 Jul 2014 06:20:10 -0700 Joe Perches j...@perches.com wrote: While doing a bit of adding argument names to fs.h, I looked at lm_grant and it seems the 2nd argument is always NULL. How about removing it? This

Re: [Cluster-devel] [RFC PATCH] dlm: Remove unused conf from lm_grant

2014-07-01 Thread David Teigland
On Tue, Jul 01, 2014 at 01:16:32PM -0400, Bob Peterson wrote: - Original Message - On Tue, Jul 01, 2014 at 10:43:13AM -0400, Jeff Layton wrote: On Tue, 01 Jul 2014 06:20:10 -0700 Joe Perches j...@perches.com wrote: While doing a bit of adding argument names to fs.h, I

Re: [Cluster-devel] [PATCH] dlm_controld: fix name printing error in logging

2014-04-25 Thread David Teigland
On Fri, Apr 25, 2014 at 03:21:48PM +0800, Lidong Zhong wrote: When the length of name_in is NAME_ID_SIZE, the last byte of the name and a whitespace will get lost. Thanks, I modified your patch to handle longer names also... commit 4283123f0b13eafc46d825050c5142cf44be79c3 Author: Lidong Zhong

Re: [Cluster-devel] CMAN/DLM without SCTP

2014-02-26 Thread David Teigland
On Wed, Feb 26, 2014 at 04:52:14PM +0530, Pratik Mehta wrote: On Wed, Feb 19, 2014 at 11:43 PM, David Teigland teigl...@redhat.com wrote: That's a fine solution. You might also be able to use 'service cman start quorum'. Apart from DLM, wouldn't this prevent fenced from starting

Re: [Cluster-devel] [PATCH] dlm: Avoid that dlm_release_lockspace() incorrectly returns -EBUSY

2013-10-16 Thread David Teigland
On Wed, Oct 16, 2013 at 02:20:25PM +0200, Bart Van Assche wrote: When dlm_release_lockspace(ls, 1) is invoked on a busy system I've pushed this to the next branch. Thanks, Dave

Re: [Cluster-devel] [patch] dlm: some checks can underflow

2013-07-31 Thread David Teigland
On Wed, Jul 31, 2013 at 12:02:29PM +0300, Dan Carpenter wrote: This is a static checker fix. We have several places here that check the upper limit without checking for negative numbers. One example of this is in find_rsb(). My static checker marks endian data as user controled so. The

Re: [Cluster-devel] [PATCH] dlm: Avoid LVB truncation

2013-06-26 Thread David Teigland
On Wed, Jun 26, 2013 at 02:27:57PM +0200, Bart Van Assche wrote: For lockspaces with an LVB length above 64 bytes, avoid truncating the LVB while exchanging it with another node in the cluster. Thanks, I've added this to next. Dave

Re: [Cluster-devel] [PATCH 0/6] dlm: sctp use fixes

2013-06-14 Thread David Teigland
On Fri, Jun 14, 2013 at 04:56:08AM -0500, micha...@cs.wisc.edu wrote: The following patches made over Linus's tree fix a handful of bugs that occur when the initial IP addr cannot be reached when using SCTP. Thanks Mike, I've pushed these to the linux-dlm next branch. Dave

[Cluster-devel] [PATCH] gfs2: add native setup to man page

2013-05-14 Thread David Teigland
List the simplest sequence of steps to manually set up and run gfs2/dlm. Signed-off-by: David Teigland teigl...@redhat.com --- gfs2/man/gfs2.5 | 188 1 file changed, 188 insertions(+) diff --git a/gfs2/man/gfs2.5 b/gfs2/man/gfs2.5 index

[Cluster-devel] [PATCH] gfs2: add native setup to man page

2013-05-13 Thread David Teigland
List the simplest sequence of steps to manually set up and run gfs2/dlm. Signed-off-by: David Teigland teigl...@redhat.com --- gfs2/man/gfs2.5 | 188 1 file changed, 188 insertions(+) diff --git a/gfs2/man/gfs2.5 b/gfs2/man/gfs2.5 index

Re: [Cluster-devel] linux-next: Tree for May 8 (dlm)

2013-05-09 Thread David Teigland
On Thu, May 09, 2013 at 09:47:45AM +1000, Stephen Rothwell wrote: [Just forwarding to David ...] On Wed, 08 May 2013 11:04:45 -0700 Randy Dunlap rdun...@infradead.org wrote: on x86_64: when CONFIG_GFS2_FS_LOCKING_DLM=y and CONFIG_DLM=m: fs/built-in.o: In function `gfs2_lock':

Re: [Cluster-devel] [PATCH] dlm_tool: Trimming garbages at in Expecting reply output

2013-05-02 Thread David Teigland
On Thu, May 02, 2013 at 09:19:21PM +0900, Masatake YAMATO wrote: The buffer used in Expecting reply of dlm_tool lockdebug output is used as C string (via printf %s) but not terminated with nul char. Yes, thanks. This was fixed for some time in dlm.git. I'm afraid we'll need to go through the

Re: [Cluster-devel] GFS2: Pull request (fixes)

2013-04-05 Thread David Teigland
On Fri, Apr 05, 2013 at 11:34:45AM +0100, Steven Whitehouse wrote: Please consider pulling the following changes, There's some mixup here that should be cleared up first. David Teigland (2): GFS2: Fix unlock of fcntl locks during withdrawn state Steven Whitehouse (1): GFS2

[Cluster-devel] [PATCH] gfs2: use kmalloc for lvb bitmap

2013-03-05 Thread David Teigland
The temp lvb bitmap was on the stack, which could be an alignment problem for __set_bit_le. Use kmalloc for it instead. Signed-off-by: David Teigland teigl...@redhat.com --- fs/gfs2/incore.h | 1 + fs/gfs2/lock_dlm.c | 31 ++- 2 files changed, 19 insertions(+), 13

Re: [Cluster-devel] DLM regression in 64-bit 3.7.x Kernel?

2013-02-25 Thread David Teigland
On Tue, Feb 19, 2013 at 11:55:14AM +0100, Jacek Konieczny wrote: Hi, I have recently upgraded my development cluster from 3.6.x to 3.7.1 kernel and clvmd stopped working (all locking operation result in 'Invalid argument'). I have traced the problem to this call: write(8,

Re: [Cluster-devel] [PATCH] idr: fix a subtle bug in idr_get_next()

2013-02-05 Thread David Teigland
-off-by: Tejun Heo t...@kernel.org Reported-by: David Teigland teigl...@redhat.com Cc: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com David, can you please test whether the patch makes the skipped deletion bug go away? Yes, I've tested, and it works fine now. Thanks, Dave

Re: [Cluster-devel] [PATCH 10/14] dlm: don't use idr_remove_all()

2013-02-01 Thread David Teigland
On Thu, Jan 31, 2013 at 04:18:41PM -0800, Tejun Heo wrote: It looks a bit weird to me that ls-ls_recover_list_count is also incremented by recover_list_add(). The two code paths don't seem to be interlocke at least upon my very shallow glance. Is it that only either the list or idr is in

Re: [Cluster-devel] [PATCH 10/14] dlm: don't use idr_remove_all()

2013-01-30 Thread David Teigland
On Tue, Jan 29, 2013 at 10:13:17AM -0500, David Teigland wrote: On Mon, Jan 28, 2013 at 10:57:23AM -0500, David Teigland wrote: On Fri, Jan 25, 2013 at 05:31:08PM -0800, Tejun Heo wrote: idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated

Re: [Cluster-devel] [PATCH 10/14] dlm: don't use idr_remove_all()

2013-01-29 Thread David Teigland
On Mon, Jan 28, 2013 at 10:57:23AM -0500, David Teigland wrote: On Fri, Jan 25, 2013 at 05:31:08PM -0800, Tejun Heo wrote: idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. The conversion isn't completely trivial for recover_idr_clear() as it's the only

Re: [Cluster-devel] [PATCH 09/14] dlm: use idr_for_each_entry() in recover_idr_clear() error path

2013-01-28 Thread David Teigland
tested. Signed-off-by: Tejun Heo t...@kernel.org Cc: Christine Caulfield ccaul...@redhat.com Cc: David Teigland teigl...@redhat.com Cc: cluster-devel@redhat.com --- This patch depends on an earlier idr patch and I think it would be best to route these together through -mm. Christine, David

Re: [Cluster-devel] [PATCH 10/14] dlm: don't use idr_remove_all()

2013-01-28 Thread David Teigland
idr_destroy(). Replace it with idr_remove() call inside idr_for_each_entry() loop. It goes on top so that it matches the operation order in recover_idr_del(). Only compile tested. Signed-off-by: Tejun Heo t...@kernel.org Cc: Christine Caulfield ccaul...@redhat.com Cc: David Teigland teigl

[Cluster-devel] [PATCH 2/2] gfs2: remove redundant lvb pointer

2012-11-14 Thread David Teigland
The lksb struct already contains a pointer to the lvb, so another directly from the glock struct is not needed. Signed-off-by: David Teigland teigl...@redhat.com --- fs/gfs2/glock.c| 10 -- fs/gfs2/incore.h |1 - fs/gfs2/lock_dlm.c |8 fs/gfs2/quota.c|6

[Cluster-devel] [PATCH 1/2] gfs2: only use lvb on glocks that need it

2012-11-14 Thread David Teigland
Save the effort of allocating, reading and writing the lvb for most glocks that do not use it. Signed-off-by: David Teigland teigl...@redhat.com --- fs/gfs2/glock.c| 27 +-- fs/gfs2/glops.c|3 ++- fs/gfs2/incore.h |3 ++- fs/gfs2/lock_dlm.c | 12

[Cluster-devel] [PATCH] gfs2: skip dlm_unlock calls in unmount

2012-11-13 Thread David Teigland
is called because it may update the lvb of the resource. Signed-off-by: David Teigland teigl...@redhat.com --- fs/gfs2/glock.c|1 + fs/gfs2/incore.h |1 + fs/gfs2/lock_dlm.c |8 3 files changed, 10 insertions(+) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index e6c2fd5

Re: [Cluster-devel] [PATCH] gfs2: skip dlm_unlock calls in unmount

2012-11-12 Thread David Teigland
On Mon, Nov 12, 2012 at 10:44:36AM +, Steven Whitehouse wrote: - save 64 bytes of memory for every local lock (32 in gfs2_glock, 32 in dlm_rsb) - save 96 bytes of memory for every remote lock (32 in gfs2_glock, 32 in local dlm_rsb, 32 in remote dlm_lkb) - save 32 bytes of

Re: [Cluster-devel] [PATCH] gfs2: skip dlm_unlock calls in unmount

2012-11-09 Thread David Teigland
On Fri, Nov 09, 2012 at 09:45:17AM +, Steven Whitehouse wrote: + if (test_bit(SDF_SKIP_DLM_UNLOCK, sdp-sd_flags) + (!gl-gl_lvb[0] || gl-gl_state != LM_ST_EXCLUSIVE)) { I'm still not happy with using !gl-gl_lvb[0] to determine whether the LVB is in use or not. I think we need a

Re: [Cluster-devel] gfs2: skip dlm_unlock calls in unmount

2012-11-08 Thread David Teigland
On Thu, Nov 08, 2012 at 10:26:53AM +, Steven Whitehouse wrote: Hi, On Wed, 2012-11-07 at 14:14 -0500, David Teigland wrote: When unmounting, gfs2 does a full dlm_unlock operation on every cached lock. This can create a very large amount of work and can take a long time to complete

Re: [Cluster-devel] gfs2: skip dlm_unlock calls in unmount

2012-11-08 Thread David Teigland
On Thu, Nov 08, 2012 at 06:48:19PM +, Steven Whitehouse wrote: Converting to NL would actually be less expensive than unlock because the NL convert does not involve a reply message, but unlock does. I'm not entirely sure I follow... at least from the filesystem point of view (and

[Cluster-devel] [PATCH] gfs2: skip dlm_unlock calls in unmount

2012-11-08 Thread David Teigland
is called because it may update the lvb of the resource. Signed-off-by: David Teigland teigl...@redhat.com --- fs/gfs2/glock.c|1 + fs/gfs2/incore.h |1 + fs/gfs2/lock_dlm.c |8 3 files changed, 10 insertions(+) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index e6c2fd5

[Cluster-devel] gfs2: skip dlm_unlock calls in unmount

2012-11-07 Thread David Teigland
is called because it may update the lvb of the resource. Signed-off-by: David Teigland teigl...@redhat.com --- fs/gfs2/glock.c|1 + fs/gfs2/incore.h |1 + fs/gfs2/lock_dlm.c |6 ++ 3 files changed, 8 insertions(+) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index e6c2fd5..f3a5edb

Re: [Cluster-devel] cluster4 dlm dlm_stonith ??? should it really fence by turning node off?

2012-11-05 Thread David Teigland
On Sat, Nov 03, 2012 at 03:58:28PM +0100, Jacek Konieczny wrote: Hello, The dlm_stonith fencing helper is really convenient when Pacemaker is in use. Though, it doesn't quite work as I would expect ??? when fencing is needed it requests a node to be turned off instead of rebooting. And it

Re: [Cluster-devel] cluster4 dlm: startup notification for systemd

2012-11-05 Thread David Teigland
On Sat, Nov 03, 2012 at 04:27:54PM +0100, Jacek Konieczny wrote: Hello, The two patches: [PATCH 1/2] --foreground option added to dlm_controld [PATCH 2/2] Startup notification by sd_notify() add startup notification for the systemd service unit. This way startup of services

Re: [Cluster-devel] [PATCH] dlm_stonith_{off, reboot} aliases for fence helper

2012-11-05 Thread David Teigland
On Mon, Nov 05, 2012 at 07:05:22PM +0100, Jacek Konieczny wrote: - rv = stonith_api_kick_helper(nodeid, 300, 1); + rv = stonith_api_kick_helper(nodeid, 300, turn_off); I'd like it to be reboot, but seeing the arg as bool off I figured the opposite would be on ... if you're saying that

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread David Teigland
On Wed, Oct 03, 2012 at 09:25:08AM +, Dietmar Maurer wrote: So the observed behavior is expected? Yes, it's a stateful partition merge, and I think /var/log/messages should have mentioned something about that. When a node is partitioned from the others (e.g. network disconnected), it has

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread David Teigland
On Wed, Oct 03, 2012 at 04:12:10PM +, Dietmar Maurer wrote: Yes, it's a stateful partition merge, and I think /var/log/messages should have mentioned something about that. When a node is partitioned from the others (e.g. network disconnected), it has to be cleanly reset before it's

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread David Teigland
On Wed, Oct 03, 2012 at 04:26:35PM +, Dietmar Maurer wrote: I guess you're talking about the dlm_tool ls output? Yes. The fencing there means it is waiting for fenced to finish fencing before it starts dlm recovery. fenced waits for quorum. So who actually starts fencing

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread David Teigland
On Wed, Oct 03, 2012 at 04:55:55PM +, Dietmar Maurer wrote: The difficult cases, which I think you're seeing, are partitions where no group has quorum, e.g. 2/2. In this case we do nothing, and the user has to resolve it by resetting some of the nodes The problem with that is that

Re: [Cluster-devel] [PATCH] dlm: check the maximum size of a request from user

2012-09-10 Thread David Teigland
On Sun, Sep 09, 2012 at 04:16:58PM +0200, Sasha Levin wrote: device_write only checks whether the request size is big enough, but it doesn't check if the size is too big. At that point, it also tries to allocate as much memory as the user has requested even if it's too much. This can

Re: [Cluster-devel] tasks of dlm_recoverd?

2012-08-27 Thread David Teigland
On Mon, Aug 27, 2012 at 01:43:22PM +0200, Heiko Nardmann wrote: Hi together! During the shutdown of my second cluster node (two node cluster) I have seen a process 'dlm_recoverd' running with 100% CPU usage for about 6 minutes. It's just that I have no idea what is the task of this

Re: [Cluster-devel] [PATCH] cman init: allow dlm hash table sizes to be tunable at startup

2012-07-25 Thread David Teigland
On Wed, Jul 25, 2012 at 07:32:28AM +0200, Fabio M. Di Nitto wrote: From: Fabio M. Di Nitto fdini...@redhat.com Resolves: rhbz#842370 looks good, thanks +# DLM_LKBTBL_SIZE - DLM_RSBTBL_SIZE - DLM_DIRTBL_SIZE +# Allow tuning of DLM kernel hash table sizes. +# do NOT change unless instructed

Re: [Cluster-devel] [patch] dlm: remove stray unlock

2012-05-21 Thread David Teigland
On Mon, May 21, 2012 at 05:35:26PM +0300, Dan Carpenter wrote: Smatch complains that we unlock this twice. It looks like an accidental to me. Thanks, will fix that.

Re: [Cluster-devel] [patch] dlm: NULL dereference on failure in kmem_cache_create()

2012-05-15 Thread David Teigland
On Tue, May 15, 2012 at 11:58:12AM +0300, Dan Carpenter wrote: We aren't allowed to pass NULL pointers to kmem_cache_destroy() so if both allocations fail, it leads to a NULL dereference. thanks, added that to next branch.

Re: [Cluster-devel] GPF in dlm_lowcomms_stop

2012-05-04 Thread David Teigland
On Fri, May 04, 2012 at 11:33:17AM -0600, dann frazier wrote: On Fri, Mar 30, 2012 at 11:17:56AM -0600, dann frazier wrote: On Fri, Mar 30, 2012 at 12:42:40PM -0400, David Teigland wrote: On Fri, Mar 30, 2012 at 11:42:56AM -0400, David Teigland wrote: Hi Dan, I'm not very familiar

Re: [Cluster-devel] [GFS2 PATCH] GFS2: Instruct DLM to avoid queue convert slowdowns

2012-04-10 Thread David Teigland
On Tue, Apr 10, 2012 at 10:12:28AM +0100, Steven Whitehouse wrote: Hi, On Thu, 2012-04-05 at 12:11 -0400, Bob Peterson wrote: Hi, Here's another patch (explanation below). This patch replies upon a DLM patch that hasn't fully gone upstream yet, so perhaps it shouldn't be added to

Re: [Cluster-devel] GPF in dlm_lowcomms_stop

2012-03-30 Thread David Teigland
On Wed, Mar 21, 2012 at 07:59:13PM -0600, dann frazier wrote: However... we've dropped the connections_lock, so its possible that a new connection gets created on line 9. This connection structure would have pointers to the workqueues that we're about to destroy. Sometime later on we get data

Re: [Cluster-devel] GPF in dlm_lowcomms_stop

2012-03-30 Thread David Teigland
On Fri, Mar 30, 2012 at 11:42:56AM -0400, David Teigland wrote: Hi Dan, I'm not very familiar with this code either, but I've talked with Chrissie and she suggested we try something like this: A second version that addresses a potentially similar problem in start. diff --git a/fs/dlm

Re: [Cluster-devel] GFS2: Pre-pull patch posting (merge window)

2012-03-23 Thread David Teigland
on i386: ERROR: sctp_do_peeloff [fs/dlm/dlm.ko] undefined! GFS2_FS selects DLM (if GFS2_FS_LOCKING_DLM, which is enabled). GFS2_FS selects IP_SCTP if DLM_SCTP, which is not enabled and not used anywhere else in the kernel tree AFAICT. DLM just always selects IP_SCTP. Here's what we have

Re: [Cluster-devel] GFS2: Pre-pull patch posting (merge window)

2012-03-23 Thread David Teigland
On Fri, Mar 23, 2012 at 01:06:05PM -0700, Randy Dunlap wrote: GFS2_FS selects DLM (if GFS2_FS_LOCKING_DLM, which is enabled). GFS2_FS selects IP_SCTP if DLM_SCTP, which is not enabled and not used anywhere else in the kernel tree AFAICT. DLM just always selects IP_SCTP. Here's what we

Re: [Cluster-devel] last element of dlm_local_addr[] never used?

2012-03-21 Thread David Teigland
On Wed, Mar 21, 2012 at 12:24:35PM +0300, Dan Carpenter wrote: In fs/dlm/lowcomms.c we declare the dlm_local_addr[] array like this: static struct sockaddr_storage *dlm_local_addr[DLM_MAX_ADDR_COUNT]; But it looks like the last element of the array is never used: 1072 /* Get local

Re: [Cluster-devel] [PATCH] fs/dlm/rcom.c: included member.h twice

2012-02-16 Thread David Teigland
On Thu, Feb 16, 2012 at 02:55:21PM +0100, Danny Kukawka wrote: fs/dlm/rcom.c included 'member.h' twice, remove the duplicate. I'll fold this into the current patch I'm working on. Signed-off-by: Danny Kukawka danny.kuka...@bisect.de --- fs/dlm/rcom.c |1 - 1 files changed, 0

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2012-01-09 Thread David Teigland
On Mon, Jan 09, 2012 at 04:36:30PM +, Steven Whitehouse wrote: On Thu, 2012-01-05 at 10:46 -0600, David Teigland wrote: This new method of managing recovery is an alternative to the previous approach of using the userland gfs_controld. - use dlm slot numbers to assign journal id's

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2012-01-09 Thread David Teigland
On Mon, Jan 09, 2012 at 11:46:26AM -0500, David Teigland wrote: On Mon, Jan 09, 2012 at 04:36:30PM +, Steven Whitehouse wrote: On Thu, 2012-01-05 at 10:46 -0600, David Teigland wrote: This new method of managing recovery is an alternative to the previous approach of using the userland

[Cluster-devel] gfs2: let spectator mount do read only recovery

2012-01-09 Thread David Teigland
on a read only block device. Signed-off-by: David Teigland teigl...@redhat.com --- fs/gfs2/incore.h |1 + fs/gfs2/ops_fstype.c |2 +- fs/gfs2/recovery.c |4 +++- 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index 9182a87

[Cluster-devel] gfs2: fail mount if journal recovery fails

2012-01-09 Thread David Teigland
If the first mounter fails to recover one of the journals during mount, the mount should fail. Signed-off-by: David Teigland teigl...@redhat.com --- fs/gfs2/incore.h |1 + fs/gfs2/recovery.c |3 ++- 2 files changed, 3 insertions(+), 1 deletions(-) diff --git a/fs/gfs2/incore.h b/fs

[Cluster-devel] gfs2: dlm based recovery coordination

2012-01-09 Thread David Teigland
merge cycle would be good, but you can send it off whenever you think is right. Dave From 0fb2d7726b570c6a5eb289bac237fb384b9c6f0b Mon Sep 17 00:00:00 2001 From: David Teigland teigl...@redhat.com Date: Tue, 20 Dec 2011 17:03:04 -0600 Subject: [PATCH] gfs2: dlm based recovery coordination

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2012-01-05 Thread David Teigland
to initiate journal recovery | - use a dlm lock to determine the first node to mount fs | - use a dlm lock to track journals that need recovery | | Signed-off-by: David Teigland teigl...@redhat.com | --- | --- a/fs/gfs2/lock_dlm.c | +++ b/fs/gfs2/lock_dlm.c (snip) | +#include linux

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination

2012-01-05 Thread David Teigland
On Thu, Jan 05, 2012 at 03:40:09PM +, Steven Whitehouse wrote: I think it would be a good plan to not send this last patch for the current merge window and let it settle for a bit longer. Running things so fine with the timing makes me nervous bearing in mind the number of changes, To

[Cluster-devel] [PATCH 1/5] dlm: convert rsb list to rb_tree

2012-01-05 Thread David Teigland
-by: David Teigland teigl...@redhat.com --- fs/dlm/debug_fs.c | 28 --- fs/dlm/dlm_internal.h |9 +++-- fs/dlm/lock.c | 87 +++- fs/dlm/lockspace.c| 23 + fs/dlm/recover.c | 21 +++ 5 files

[Cluster-devel] [PATCH 0/5] dlm and gfs2 patches for 3.3

2012-01-05 Thread David Teigland
in userland. This new feature is not used by current dlm_controld and gfs_controld daemons, but will be enabled by a new dlm_controld version under development. Bob Peterson (1): dlm: convert rsb list to rb_tree David Teigland (4): dlm: move recovery barrier calls dlm: add node slots

  1   2   3   4   >