Re: [Cluster-devel] [PATCH 2/2] MAINTAINERS: Update dlm mailing list

2023-09-05 Thread David Teigland
AINTAINERS b/MAINTAINERS > index caae31fb9741..946fcf6c8d77 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -6093,7 +6093,7 @@ F: include/video/udlfb.h > DISTRIBUTED LOCK MANAGER (DLM) > M: Christine Caulfield > M: David Teigland > -L: cluster-devel@redhat.co

[Cluster-devel] [GIT PULL] dlm updates for 6.5

2023-06-29 Thread David Teigland
Hi Linus, Please pull dlm updates from tag: git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-6.5 The dlm posix lock handling (for gfs2) has three notable changes: - Local pids returned from GETLK are no longer negated. A previous patch negating remote pids mistakenly

[Cluster-devel] [GIT PULL] dlm updates for 6.4

2023-04-25 Thread David Teigland
Hi Linus, Please pull dlm updates from tag: git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-6.4 Change summary: Remove some unused features (related to lock timeouts) that have been previously scheduled for removal. Fix a bug where the pending callback flag would be

[Cluster-devel] [GIT PULL] dlm updates for 6.3

2023-02-20 Thread David Teigland
Hi Linus, Please pull dlm updates from tag: git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-6.3 This patch set fixes some races in the lowcomms startup and shutdown code that were found by targetted stress testing that quickly and repeatedly joins and leaves lockspaces.

[Cluster-devel] [GIT PULL] dlm updates for 6.2

2022-12-12 Thread David Teigland
Hi Linus, Please pull dlm updates from tag: git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-6.2 These patches include the usual cleanups and minor fixes, removal of code that is no longer needed due to recent improvements, and improvements to processing large volumes of

Re: [Cluster-devel] [PATCH v2][next] dlm: Replace one-element array with flexible-array member

2022-10-11 Thread David Teigland
On Mon, Oct 10, 2022 at 03:35:24PM -0700, Kees Cook wrote: > On Mon, Oct 10, 2022 at 04:00:39PM -0500, David Teigland wrote: > > On Sat, Oct 08, 2022 at 09:03:28PM -0700, Kees Cook wrote: > > > On Sun, Oct 09, 2022 at 03:05:17PM +1300, Paulo Miguel Almeida wrote: > > > &

Re: [Cluster-devel] [PATCH v2][next] dlm: Replace one-element array with flexible-array member

2022-10-10 Thread David Teigland
On Sat, Oct 08, 2022 at 09:03:28PM -0700, Kees Cook wrote: > On Sun, Oct 09, 2022 at 03:05:17PM +1300, Paulo Miguel Almeida wrote: > > On Sat, Oct 08, 2022 at 05:18:35PM -0700, Kees Cook wrote: > > > This is allocating 1 more byte than before, since the struct size didn't > > > change. But this

[Cluster-devel] [GIT PULL] dlm updates for 6.1

2022-10-03 Thread David Teigland
Hi Linus, Please pull dlm updates from tag: git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-6.1 This set of commits includes: . Fix a couple races found with a new torture test. . Improve errors when api functions are used incorrectly. . Improve tracing for lock

Re: [Cluster-devel] [GIT PULL] dlm updates for 6.0

2022-08-01 Thread David Teigland
On Mon, Aug 01, 2022 at 09:17:30AM -0700, Linus Torvalds wrote: > But again: please don't rebase stuff you have already exposed to > others. It causes real issues. This was just one example of it. > > And if you *do* have to rebase for a real technical reason ("Oh, that > was a disaster, it

Re: [Cluster-devel] [GIT PULL] dlm updates for 6.0

2022-08-01 Thread David Teigland
On Mon, Aug 01, 2022 at 10:50:28AM -0500, David Teigland wrote: > On Mon, Aug 01, 2022 at 08:46:24AM -0700, Linus Torvalds wrote: > > On Mon, Aug 1, 2022 at 7:43 AM David Teigland wrote: > > > > > > (You can ignore the premature 5.20 pull request from som

Re: [Cluster-devel] [GIT PULL] dlm updates for 6.0

2022-08-01 Thread David Teigland
On Mon, Aug 01, 2022 at 08:46:24AM -0700, Linus Torvalds wrote: > On Mon, Aug 1, 2022 at 7:43 AM David Teigland wrote: > > > > (You can ignore the premature 5.20 pull request from some weeks ago.) > > Gaah. That was the first thing I pulled this morning because

[Cluster-devel] [GIT PULL] dlm updates for 6.0

2022-08-01 Thread David Teigland
Hi Linus, Please pull dlm updates from tag: git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-6.0 Changes in this set of commits: . Delay the cleanup of interrupted posix lock requests until the user space result arrives. Previously, the immediate cleanup would lead

Re: [Cluster-devel] [PATCH dlm/next 1/4] fs: dlm: replace sanity checks with WARN_ON

2022-02-17 Thread David Teigland
On Thu, Feb 17, 2022 at 01:36:44AM +0100, Andreas Gruenbacher wrote: > On Wed, Feb 16, 2022 at 5:16 PM Alexander Aring wrote: > > > > - spin_lock(_lock); > > > > - if (!list_empty(>list)) { > > > > - log_error(ls, "dlm_posix_lock: op on list %llx", > > > > -

Re: [Cluster-devel] [RFC PATCH dlm 00/10] dlm_controld config settings can be

2021-09-22 Thread David Teigland
On Wed, Sep 22, 2021 at 05:32:49PM +0800, heming.z...@suse.com wrote: > If there is no chance to add dynamic updating setting by run command. > Is it a good idea to add a parameter "-I", like "dlm_tool -I reload_config". > "-I" means directly change without reading from dlm.conf. > When users want

Re: [Cluster-devel] [RFC PATCH dlm 00/10] dlm_controld config settings can be

2021-09-21 Thread David Teigland
On Tue, Sep 21, 2021 at 02:38:45PM +0800, heming.z...@suse.com wrote: > But I am ok with the reload_config idea, it's more basic. > We could give dlm_controld a chance to change behavior on the fly. > If needed, I could file a new patch for feature "reload_config", can I do it? Yes, I'd welcome a

Re: [Cluster-devel] [RFC PATCH dlm 00/10] dlm_controld config settings can be

2021-09-20 Thread David Teigland
On Sun, Sep 19, 2021 at 02:43:12PM +0800, Heming Zhao wrote: > This new feature gives dlm ability to change config settings dynamically. Hi Heming, Letting dlm_controld reload certain settings from dlm.conf makes sense, but I'd like something more basic. Let the user edit dlm.conf, then run

Re: [Cluster-devel] [BUG] fs: dlm: possible ABBA deadlock

2021-08-19 Thread David Teigland
On Thu, Aug 19, 2021 at 04:54:57PM +0800, Jia-Ju Bai wrote: > Hello, > > My static analysis tool reports a possible ABBA deadlock in the dlm > filesystem in Linux 5.10: > > dlm_recover_waiters_pre() >   mutex_lock(>ls_waiters_mutex); --> line 5130 >   recover_convert_waiter() >    

Re: [Cluster-devel] Why does dlm_lock function fails when downconvert a dlm lock?

2021-08-16 Thread David Teigland
On Mon, Aug 16, 2021 at 09:41:18AM -0500, David Teigland wrote: > On Fri, Aug 13, 2021 at 02:49:04PM +0800, Gang He wrote: > > Hi David, > > > > On 2021/8/13 1:45, David Teigland wrote: > > > On Thu, Aug 12, 2021 at 01:44:53PM +0800, Gang He wrote: > > >

Re: [Cluster-devel] Why does dlm_lock function fails when downconvert a dlm lock?

2021-08-16 Thread David Teigland
On Fri, Aug 13, 2021 at 02:49:04PM +0800, Gang He wrote: > Hi David, > > On 2021/8/13 1:45, David Teigland wrote: > > On Thu, Aug 12, 2021 at 01:44:53PM +0800, Gang He wrote: > > > In fact, I can reproduce this problem stably. > > > I want to know if this error ha

Re: [Cluster-devel] Why does dlm_lock function fails when downconvert a dlm lock?

2021-08-12 Thread David Teigland
On Thu, Aug 12, 2021 at 01:44:53PM +0800, Gang He wrote: > In fact, I can reproduce this problem stably. > I want to know if this error happen is by our expectation? since there is > not any extreme pressure test. > Second, how should we handle these error cases? call dlm_lock function > again?

Re: [Cluster-devel] [PATCH -next] fs: dlm: fix missing unlock on error in accept_from_sock()

2021-03-29 Thread David Teigland
On Sat, Mar 27, 2021 at 04:37:04PM +0800, Yang Yingliang wrote: > Add the missing unlock before return from accept_from_sock() > in the error handling case. Thanks, applied to the next branch. Dave > Fixes: 6cde210a9758 ("fs: dlm: add helper for init connection") > Reported-by: Hulk Robot >

Re: [Cluster-devel] [PATCH 1/4] sctp: add sctp_sock_set_nodelay

2020-05-29 Thread David Teigland
On Fri, May 29, 2020 at 02:09:40PM +0200, Christoph Hellwig wrote: > Add a helper to directly set the SCTP_NODELAY sockopt from kernel space > without going through a fake uaccess. Ack, they look fine to me, thanks. Dave

Re: [Cluster-devel] is it ok to always pull in sctp for dlm, was: Re: [PATCH 27/33] sctp: export sctp_setsockopt_bindx

2020-05-14 Thread David Teigland
On Thu, May 14, 2020 at 12:40:40PM +0200, Christoph Hellwig wrote: > On Wed, May 13, 2020 at 03:00:58PM -0300, Marcelo Ricardo Leitner wrote: > > On Wed, May 13, 2020 at 08:26:42AM +0200, Christoph Hellwig wrote: > > > And call it directly from dlm instead of going through kernel_setsockopt. > >

Re: [Cluster-devel] [PATCH] dlm: no need to check return value of debugfs_create functions

2019-06-12 Thread David Teigland
On Wed, Jun 12, 2019 at 05:25:36PM +0200, Greg Kroah-Hartman wrote: > When calling debugfs functions, there is no need to ever check the > return value. The function can work or not, but the code logic should > never do something different based on this. Thanks, pushed to next branch in

Re: [Cluster-devel] [PATCH v2 1/3] dlm: check if workqueues are NULL before flushing/destroying

2019-04-05 Thread David Teigland
I tried these with one address and had no problem, so I've pushed them to the linux-dlm next branch. Dave

Re: [Cluster-devel] [GFS2 PATCH] gfs2: Panic when an io error occurs writing to the journal

2018-12-17 Thread David Teigland
On Mon, Dec 17, 2018 at 09:58:47AM -0500, Bob Peterson wrote: > Dave Teigland recommended. Unless I'm mistaken, Dave has said that GFS2 > should never withdraw; it should always just kernel panic (Dave, correct > me if I'm wrong). At least this patch confines that behavior to a small > subset of

Re: [Cluster-devel] lost idr_destroy for ls_recover_idr in release_lockspace() ?

2018-11-15 Thread David Teigland
On Thu, Nov 15, 2018 at 09:49:17AM +0300, Vasily Averin wrote: > Dear David, > I've noticed that release_lockspace() lacks idr_destroy(>ls_recover_idr), > though it is called on rollback in new_lockspace(). > > It seems for me it is not critical, and should not lead to any leaks, > however could

Re: [Cluster-devel] [PATCH 0/3] dlm: fix various incorrect behaviors

2018-11-07 Thread David Teigland
On Fri, Nov 02, 2018 at 02:18:19PM -0600, Tycho Andersen wrote: > Hi, > > here's a series to fix some bugs I noticed in the DLM. The third patch > in the series and maybe the first should probably go to stable, assuming > everyone agrees they're indeed bugs. > > Thanks, > > Tycho > > Tycho

Re: [Cluster-devel] How to enable daemon_debug for dlm_controld

2018-06-27 Thread David Teigland
On Wed, Jun 27, 2018 at 04:07:18PM +0800, Guoqing Jiang wrote: > But by default, seems dlm_controld just run with "-s 0". And I tried to add > "daemon_debug=1" to /etc/dlm/dlm.conf, > then dlm resource can't start at all. Could you tell me how to enable this > option? Thanks in advance! That

Re: [Cluster-devel] [PATCH] dlm: prompt the user SCTP is experimental

2018-04-02 Thread David Teigland
On Thu, Mar 22, 2018 at 10:27:56PM -0600, Gang He wrote: > Hello David, > > Do you agree to add this prompt to the user? > Since sometimes customers attempted to setup SCTP protocol with two rings, > but they could not get the expected result, then it maybe bring some concerns > to the

Re: [Cluster-devel] [ClusterLabs] DLM connection channel switch take too long time (> 5mins)

2018-03-08 Thread David Teigland
> I use active rrp_mode in corosync.conf and reboot the cluster to let the > configuration effective. > But, the about 5 mins hang in new_lockspace() function is still here. The last time I tested connection failures with sctp was several years ago, but I recall seeing similar problems. I had

Re: [Cluster-devel] [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup

2017-10-09 Thread David Teigland
On Sat, Oct 07, 2017 at 03:26:11AM +0100, Al Viro wrote: > On Sat, Oct 07, 2017 at 09:59:41AM +0800, Jia-Ju Bai wrote: > > According to fs/dlm/lock.c, the kernel may sleep under a spinlock, > > and the function call path is: > > dlm_master_lookup (acquire the spinlock) > >

Re: [Cluster-devel] [PATCH] dlm/recoverd: recheck kthread_should_stop() before schedule()

2017-09-25 Thread David Teigland
On Mon, Sep 25, 2017 at 03:47:50PM +0800, Guoqing Jiang wrote: > Call schedule() here could make the thread miss wake > up from kthread_stop(), so it is better to recheck > kthread_should_stop() before call schedule(), a symptom > happened when I run indefinite test (which mostly created >

Re: [Cluster-devel] [PATCH 00/18] [try #2] DLM: dlm patches need review

2017-09-18 Thread David Teigland
The patches are now here for testing https://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git/log/?h=next Dave

Re: [Cluster-devel] [PATCH 00/18] [try #2] DLM: dlm patches need review

2017-09-13 Thread David Teigland
On Wed, Sep 13, 2017 at 10:51:06AM +0100, Steven Whitehouse wrote: > Hi, > > > On 12/09/17 09:54, tsutomu@toshiba.co.jp wrote: > > Hi, > > > > This series of patches (2nd version after previous review on August) is to > > fix various bugs. This patch set is against the mainline kernel. > >

Re: [Cluster-devel] [PATCH 13/18] [try #2] DLM: fix conversion deadlock when DLM_LKF_NODLCKWT flag is set

2017-09-12 Thread David Teigland
On Tue, Sep 12, 2017 at 09:01:31AM +, tsutomu@toshiba.co.jp wrote: > When the DLM_LKF_NODLCKWT flag was set, even if conversion deadlock > was detected, the caller of can_be_granted() was unknown. > We change the behavior of can_be_granted() and change it to detect > conversion deadlock

Re: [Cluster-devel] [PATCH 10/17] dlm: use schedule_timeout instead of schedule in dlm_recoverd

2017-08-22 Thread David Teigland
On Thu, Aug 17, 2017 at 11:40:13PM +, tsutomu@toshiba.co.jp wrote: > If you refer to other implementations in kernel, the following > modifications may be better. > The important thing is to call kthread_should_stop() after > set_current_state(TASK_INTERRUPTIBLE). How is this fix?

Re: [Cluster-devel] [PATCH 08/17] dlm: retry rcom when dlm_wait_function is timed out.

2017-08-09 Thread David Teigland
On Wed, Aug 09, 2017 at 05:50:31AM +, tsutomu@toshiba.co.jp wrote: > If a node sends a DLM_RCOM_STATUS command and an error occurs on the > receiving side, the DLM_RCOM_STATUS_REPLY response may not be returned. > We retransmitted the DLM_RCOM_STATUS command so that we do not wait for > an

Re: [Cluster-devel] [PATCH 10/17] dlm: use schedule_timeout instead of schedule in dlm_recoverd

2017-08-09 Thread David Teigland
On Wed, Aug 09, 2017 at 05:51:01AM +, tsutomu@toshiba.co.jp wrote: > When dlm_recoverd_stop() is called between kthread_should_stop() and > set_task_state(), dlm_recoverd will not wake up. This works, but have you looked elsewhere in the kernel for kthread examples we could copy that do a

Re: [Cluster-devel] [PATCH] dlm: use sock_create_lite inside tcp_accept_from_sock

2017-08-07 Thread David Teigland
On Mon, Aug 07, 2017 at 02:31:20PM +0800, Guoqing Jiang wrote: > To resolve the issue, we need to use sock_create_lite > instead of sock_create_kern, like commit 0933a578cd55 > ("rds: tcp: use sock_create_lite() to create the accept > socket") did. Thanks, this is now in linux-dlm next. Dave

Re: [Cluster-devel] [PATCH v2] fs/dlm: Fix kernel memory disclosure

2017-02-22 Thread David Teigland
On Wed, Feb 22, 2017 at 03:45:34PM +0800, Vlad Tsyrklevich wrote: > Hello, I wanted to ping the list and see if this could get a review: now pushed to linux-dlm.git > > Clear the 'unused' field and the uninitialized padding in 'lksb' to > > avoid leaking memory to userland in

Re: [Cluster-devel] Question on LVB when the node that held EX lock crash

2016-11-30 Thread David Teigland
On Wed, Nov 30, 2016 at 05:07:22PM +0800, Eric Ren wrote: > @@ -852,12 +868,19 @@ void dlm_recover_rsbs(struct dlm_ls *ls) > if (is_master(r)) { > if (rsb_flag(r, RSB_RECOVER_CONVERT)) > recover_conversion(r); > + > +

Re: [Cluster-devel] Question on LVB when the node that held EX lock crash

2016-11-16 Thread David Teigland
On Wed, Nov 16, 2016 at 04:42:09PM +0800, Eric Ren wrote: > On 11/16/2016 04:29 PM, Eric Ren wrote: > > Hi David and all, > > > > I am debugging an issue of ocfs2 that relates to LVB value. I will try > > to make it a pure DLM question: > > > > Two nodes (N1, N2) try to truncate the same

Re: [Cluster-devel] [DLM PATCH] DLM: Don't specify WQ_UNBOUND for the ast callback workqueue

2016-10-19 Thread David Teigland
On Wed, Oct 19, 2016 at 11:34:54AM -0400, Bob Peterson wrote: > Hi, > > This patch removes the WQ_UNBOUND flag (which implies WQ_HIGHPRI) > from the DLM's ast work queue, in favor of just WQ_HIGHPRI. > This has been shown to cause a 19 percent performance increase for > simultaneous inode creates

Re: [Cluster-devel] About dlm_unlock (kernel space)

2016-06-13 Thread David Teigland
On Mon, Jun 13, 2016 at 07:15:09AM -0400, Guoqing Jiang wrote: > Hi, > > In case we have set DLM_LKF_CONVERT flag for dlm_lock, is it > possible that the convert > queue could be NULL or not NULL while perform unlock? I think there > are two different > cases would appear when call dlm_unlock: >

Re: [Cluster-devel] [DLM PATCH] dlm_controld: outputs explicit info about stateful

2016-05-18 Thread David Teigland
On Wed, May 18, 2016 at 02:53:00PM +0800, Eric Ren wrote: > Q1: what's stateful merged node? > Q2: what if we add the stateful merged nodes to dlm_controld daemon > cpg instead of fencing them? The details here are fundamental to the way dlm works because the dlm depends on the properties of

Re: [Cluster-devel] [DLM PATCH] dlm_controld: add option of enable_force_kick

2016-05-16 Thread David Teigland
On Mon, May 16, 2016 at 04:07:18PM +0800, Eric Ren wrote: > When there are 3 or more partitions that merge, none may see enough > clean nodes. Therefore, DLM would be stuck there forever unitl administrator > manually reset/restart enough nodes to produce sufficient clean nodes. > However,

Re: [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection

2016-05-16 Thread David Teigland
On Mon, May 16, 2016 at 03:44:27PM +0800, Eric Ren wrote: > Thanks! Hum, according to the long comments, you've handled the 2/2 > even split by way of the low nodeid killing statefull merged > numbers. Interesting, I'd forgotten about that bit of code, so I was wrong to say that we do nothing

Re: [Cluster-devel] inconsistent dlm_new_lockspace LVB_LEN size from ocfs2 user-space tool and ocfs2 kernel module

2016-05-13 Thread David Teigland
On Fri, May 13, 2016 at 02:36:25AM -0600, Gang He wrote: > Here is a inconsistent LVB_LEN size problem when create a new lockspace > from user-space tool (e.g. fsck.ocfs2) and kernel module (e.g. > ocfs2/stack_user.c). > From the userspace tool, the LVB size is DLM_USER_LVB_LEN (32 bytes, >

Re: [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection

2016-05-13 Thread David Teigland
On Fri, May 13, 2016 at 01:45:47PM +0800, Eric Ren wrote: > >the cluster. Neither option is good. In the past we decided to let the > >cluster sit in this state so an admin could choose which nodes to remove. > >Do you prefer the alternative of kicking nodes in this case (with somewhat >

Re: [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection

2016-05-12 Thread David Teigland
On Thu, May 12, 2016 at 05:16:08PM +0800, Eric Ren wrote: > DLM would be stuck in "need fencing" state, although cluster can > regain quorum very quickly after a network transient disconnection. > > It's possible that this process happens within one monoclock. It > means

Re: [Cluster-devel] [DLM PATCH 0/6] Misc DLM Improvements Regarding Socket Errors

2016-02-15 Thread David Teigland
On Mon, Feb 15, 2016 at 04:16:17PM -0500, Bob Peterson wrote: > I think the "right thing to do" at this point is this: > > 1. Patch #1 is already upstream > 2. Patch #2 stands on its own, so I think this should go forward. > 3. Combine patches 3, 4 and 5, which ought to provide a comprehensive

Re: [Cluster-devel] [DLM PATCH 0/6] Misc DLM Improvements Regarding Socket Errors

2016-02-11 Thread David Teigland
On Thu, Feb 11, 2016 at 01:39:09PM -0500, Bob Peterson wrote: > The problem is: While testing the dlm in multiple recovery situations, > Nate and I discovered multiple problems. Until recently, no one has tried > to run recovery tests on an upstream DLM, (Let's distinguish tcp connection

Re: [Cluster-devel] [DLM PATCH 0/6] Misc DLM Improvements Regarding Socket Errors

2016-02-11 Thread David Teigland
ob Peterson <rpete...@redhat.com> Signed-off-by: David Teigland <teigl...@redhat.com> Could we begin with one patch that's easy to track that directly resolves the issues with that commit (perhaps even a revert if it's not simple to fix directly)? That brings us back to a known-good

Re: [Cluster-devel] DLM Shutdown

2016-02-10 Thread David Teigland
On Wed, Feb 10, 2016 at 02:33:49AM +0100, Andreas Gruenbacher wrote: > never actively releases existing lockspaces. This means that as soon > as any application creates the default lockspace (via libdlm), or if > an application doesn't release any lockspaces it creates, dlm_controld > will never

Re: [Cluster-devel] DLM Shutdown

2016-02-10 Thread David Teigland
On Wed, Feb 10, 2016 at 09:38:58PM +0100, Andreas Gruenbacher wrote: > On Wed, Feb 10, 2016 at 9:18 PM, David Teigland <teigl...@redhat.com> wrote: > > On Wed, Feb 10, 2016 at 08:48:12PM +0100, Andreas Gruenbacher wrote: > >> When a shutdown is requested, shouldn't dlm

Re: [Cluster-devel] DLM Shutdown

2016-02-10 Thread David Teigland
On Wed, Feb 10, 2016 at 08:48:12PM +0100, Andreas Gruenbacher wrote: > When a shutdown is requested, shouldn't dlm_controld really release > lockspaces in a similar way as well? You could probably do that if you check that the lockspace is managing no local locks (which would be a pain). If

Re: [Cluster-devel] [GFS2 PATCH 1/2] GFS2: Make gfs2_clear_inode() queue the final put

2015-12-04 Thread David Teigland
On Fri, Dec 04, 2015 at 09:51:53AM -0500, Bob Peterson wrote: > it's from the fenced process, and if so, queue the final put. That should > mitigate the problem. Bob, I'm perplexed by the focus on fencing; this issue is broader than fencing as I mentioned in bz 1255872. Over the years that I've

Re: [Cluster-devel] [PATCH 17/23] dlm: use per-attribute show and store methods

2015-09-28 Thread David Teigland
On Fri, Sep 25, 2015 at 06:49:54AM -0700, Christoph Hellwig wrote: > Signed-off-by: Christoph Hellwig > --- > fs/dlm/config.c | 288 > +++- > 1 file changed, 74 insertions(+), 214 deletions(-) Looks good to me. Dave

Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-11 Thread David Teigland
On Thu, Jun 11, 2015 at 05:47:28PM +0800, Guoqing Jiang wrote: Do you consider take the following clean up? If yes, I will send a formal patch, otherwise pls ignore it. On first glance, the old and new code do not appear to do the same thing, so let's leave it as it is. - to_nodeid =

Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-10 Thread David Teigland
On Wed, Jun 10, 2015 at 11:10:44AM +0800, Guoqing Jiang wrote: The remove_from_waiters could only be invoked after failed to create_message, right? Since send_message always returns 0, this patch doesn't touch anything about the failure path, and it also doesn't change the original semantic.

Re: [Cluster-devel] Kernel crash at DLM: kernel BUG at /usr/src/packages/BUILD/dlm-1.6.fio/obj/default/lowcomms.c:715!

2015-01-29 Thread David Teigland
On Thu, Jan 29, 2015 at 03:50:58AM +, Pralay Dakua wrote: 645 static int receive_from_sock(struct connection *con) 646 { 704 705 /* Process SCTP notifications */ 706 if (msg.msg_flags MSG_NOTIFICATION) { 707 msg.msg_control = incmsg; 708

Re: [Cluster-devel] [RFA][PATCH 5/8] dlm: Remove seq_printf() return checks and use seq_has_overflowed()

2014-11-04 Thread David Teigland
On Tue, Nov 04, 2014 at 08:08:52AM -0500, Steven Rostedt wrote: On Wed, 29 Oct 2014 17:56:07 -0400 Steven Rostedt rost...@goodmis.org wrote: From: Joe Perches j...@perches.com [ REQUEST FOR ACKS ] Can any of the DLM maintainers give me an Acked-by for this? Looks ok, Dave

Re: [Cluster-devel] [DLM PATCH] DLM: Don't wait for resource library lookups if NOLOOKUP is specified

2014-10-01 Thread David Teigland
On Wed, Oct 01, 2014 at 01:21:41PM -0400, Bob Peterson wrote: Hi, This patch adds a new lock flag, DLM_LKF_NOLOOKUP, which instructs DLM to refrain from sending lookup requests in cases where the lock library node is not the current node. This is similar to the DLM_LKF_NOQUEUE flag, except

Re: [Cluster-devel] [RFC PATCH] dlm: Remove unused conf from lm_grant

2014-07-01 Thread David Teigland
On Tue, Jul 01, 2014 at 10:43:13AM -0400, Jeff Layton wrote: On Tue, 01 Jul 2014 06:20:10 -0700 Joe Perches j...@perches.com wrote: While doing a bit of adding argument names to fs.h, I looked at lm_grant and it seems the 2nd argument is always NULL. How about removing it? This

Re: [Cluster-devel] [RFC PATCH] dlm: Remove unused conf from lm_grant

2014-07-01 Thread David Teigland
On Tue, Jul 01, 2014 at 01:16:32PM -0400, Bob Peterson wrote: - Original Message - On Tue, Jul 01, 2014 at 10:43:13AM -0400, Jeff Layton wrote: On Tue, 01 Jul 2014 06:20:10 -0700 Joe Perches j...@perches.com wrote: While doing a bit of adding argument names to fs.h, I

Re: [Cluster-devel] CMAN/DLM without SCTP

2014-02-26 Thread David Teigland
On Wed, Feb 26, 2014 at 04:52:14PM +0530, Pratik Mehta wrote: On Wed, Feb 19, 2014 at 11:43 PM, David Teigland teigl...@redhat.com wrote: That's a fine solution. You might also be able to use 'service cman start quorum'. Apart from DLM, wouldn't this prevent fenced from starting

Re: [Cluster-devel] [PATCH] dlm: Avoid that dlm_release_lockspace() incorrectly returns -EBUSY

2013-10-16 Thread David Teigland
On Wed, Oct 16, 2013 at 02:20:25PM +0200, Bart Van Assche wrote: When dlm_release_lockspace(ls, 1) is invoked on a busy system I've pushed this to the next branch. Thanks, Dave

Re: [Cluster-devel] [patch] dlm: some checks can underflow

2013-07-31 Thread David Teigland
On Wed, Jul 31, 2013 at 12:02:29PM +0300, Dan Carpenter wrote: This is a static checker fix. We have several places here that check the upper limit without checking for negative numbers. One example of this is in find_rsb(). My static checker marks endian data as user controled so. The

Re: [Cluster-devel] [PATCH] dlm: Avoid LVB truncation

2013-06-26 Thread David Teigland
On Wed, Jun 26, 2013 at 02:27:57PM +0200, Bart Van Assche wrote: For lockspaces with an LVB length above 64 bytes, avoid truncating the LVB while exchanging it with another node in the cluster. Thanks, I've added this to next. Dave

Re: [Cluster-devel] [PATCH 0/6] dlm: sctp use fixes

2013-06-14 Thread David Teigland
On Fri, Jun 14, 2013 at 04:56:08AM -0500, micha...@cs.wisc.edu wrote: The following patches made over Linus's tree fix a handful of bugs that occur when the initial IP addr cannot be reached when using SCTP. Thanks Mike, I've pushed these to the linux-dlm next branch. Dave

[Cluster-devel] [PATCH] gfs2: add native setup to man page

2013-05-14 Thread David Teigland
List the simplest sequence of steps to manually set up and run gfs2/dlm. Signed-off-by: David Teigland teigl...@redhat.com --- gfs2/man/gfs2.5 | 188 1 file changed, 188 insertions(+) diff --git a/gfs2/man/gfs2.5 b/gfs2/man/gfs2.5 index

[Cluster-devel] [PATCH] gfs2: add native setup to man page

2013-05-13 Thread David Teigland
List the simplest sequence of steps to manually set up and run gfs2/dlm. Signed-off-by: David Teigland teigl...@redhat.com --- gfs2/man/gfs2.5 | 188 1 file changed, 188 insertions(+) diff --git a/gfs2/man/gfs2.5 b/gfs2/man/gfs2.5 index

Re: [Cluster-devel] linux-next: Tree for May 8 (dlm)

2013-05-09 Thread David Teigland
On Thu, May 09, 2013 at 09:47:45AM +1000, Stephen Rothwell wrote: [Just forwarding to David ...] On Wed, 08 May 2013 11:04:45 -0700 Randy Dunlap rdun...@infradead.org wrote: on x86_64: when CONFIG_GFS2_FS_LOCKING_DLM=y and CONFIG_DLM=m: fs/built-in.o: In function `gfs2_lock':

Re: [Cluster-devel] [PATCH] dlm_tool: Trimming garbages at in Expecting reply output

2013-05-02 Thread David Teigland
On Thu, May 02, 2013 at 09:19:21PM +0900, Masatake YAMATO wrote: The buffer used in Expecting reply of dlm_tool lockdebug output is used as C string (via printf %s) but not terminated with nul char. Yes, thanks. This was fixed for some time in dlm.git. I'm afraid we'll need to go through the

Re: [Cluster-devel] GFS2: Pull request (fixes)

2013-04-05 Thread David Teigland
On Fri, Apr 05, 2013 at 11:34:45AM +0100, Steven Whitehouse wrote: Please consider pulling the following changes, There's some mixup here that should be cleared up first. David Teigland (2): GFS2: Fix unlock of fcntl locks during withdrawn state Steven Whitehouse (1): GFS2

[Cluster-devel] [PATCH] gfs2: use kmalloc for lvb bitmap

2013-03-05 Thread David Teigland
The temp lvb bitmap was on the stack, which could be an alignment problem for __set_bit_le. Use kmalloc for it instead. Signed-off-by: David Teigland teigl...@redhat.com --- fs/gfs2/incore.h | 1 + fs/gfs2/lock_dlm.c | 31 ++- 2 files changed, 19 insertions(+), 13

Re: [Cluster-devel] DLM regression in 64-bit 3.7.x Kernel?

2013-02-25 Thread David Teigland
On Tue, Feb 19, 2013 at 11:55:14AM +0100, Jacek Konieczny wrote: Hi, I have recently upgraded my development cluster from 3.6.x to 3.7.1 kernel and clvmd stopped working (all locking operation result in 'Invalid argument'). I have traced the problem to this call: write(8,

Re: [Cluster-devel] [PATCH] idr: fix a subtle bug in idr_get_next()

2013-02-05 Thread David Teigland
-off-by: Tejun Heo t...@kernel.org Reported-by: David Teigland teigl...@redhat.com Cc: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com David, can you please test whether the patch makes the skipped deletion bug go away? Yes, I've tested, and it works fine now. Thanks, Dave

Re: [Cluster-devel] [PATCH 10/14] dlm: don't use idr_remove_all()

2013-02-01 Thread David Teigland
On Thu, Jan 31, 2013 at 04:18:41PM -0800, Tejun Heo wrote: It looks a bit weird to me that ls-ls_recover_list_count is also incremented by recover_list_add(). The two code paths don't seem to be interlocke at least upon my very shallow glance. Is it that only either the list or idr is in

Re: [Cluster-devel] [PATCH 10/14] dlm: don't use idr_remove_all()

2013-01-30 Thread David Teigland
On Tue, Jan 29, 2013 at 10:13:17AM -0500, David Teigland wrote: On Mon, Jan 28, 2013 at 10:57:23AM -0500, David Teigland wrote: On Fri, Jan 25, 2013 at 05:31:08PM -0800, Tejun Heo wrote: idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated

Re: [Cluster-devel] [PATCH 10/14] dlm: don't use idr_remove_all()

2013-01-29 Thread David Teigland
On Mon, Jan 28, 2013 at 10:57:23AM -0500, David Teigland wrote: On Fri, Jan 25, 2013 at 05:31:08PM -0800, Tejun Heo wrote: idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. The conversion isn't completely trivial for recover_idr_clear() as it's the only

Re: [Cluster-devel] [PATCH 09/14] dlm: use idr_for_each_entry() in recover_idr_clear() error path

2013-01-28 Thread David Teigland
tested. Signed-off-by: Tejun Heo t...@kernel.org Cc: Christine Caulfield ccaul...@redhat.com Cc: David Teigland teigl...@redhat.com Cc: cluster-devel@redhat.com --- This patch depends on an earlier idr patch and I think it would be best to route these together through -mm. Christine, David

Re: [Cluster-devel] [PATCH 10/14] dlm: don't use idr_remove_all()

2013-01-28 Thread David Teigland
idr_destroy(). Replace it with idr_remove() call inside idr_for_each_entry() loop. It goes on top so that it matches the operation order in recover_idr_del(). Only compile tested. Signed-off-by: Tejun Heo t...@kernel.org Cc: Christine Caulfield ccaul...@redhat.com Cc: David Teigland teigl

[Cluster-devel] [PATCH 2/2] gfs2: remove redundant lvb pointer

2012-11-14 Thread David Teigland
The lksb struct already contains a pointer to the lvb, so another directly from the glock struct is not needed. Signed-off-by: David Teigland teigl...@redhat.com --- fs/gfs2/glock.c| 10 -- fs/gfs2/incore.h |1 - fs/gfs2/lock_dlm.c |8 fs/gfs2/quota.c|6

[Cluster-devel] [PATCH 1/2] gfs2: only use lvb on glocks that need it

2012-11-14 Thread David Teigland
Save the effort of allocating, reading and writing the lvb for most glocks that do not use it. Signed-off-by: David Teigland teigl...@redhat.com --- fs/gfs2/glock.c| 27 +-- fs/gfs2/glops.c|3 ++- fs/gfs2/incore.h |3 ++- fs/gfs2/lock_dlm.c | 12

[Cluster-devel] [PATCH] gfs2: skip dlm_unlock calls in unmount

2012-11-13 Thread David Teigland
is called because it may update the lvb of the resource. Signed-off-by: David Teigland teigl...@redhat.com --- fs/gfs2/glock.c|1 + fs/gfs2/incore.h |1 + fs/gfs2/lock_dlm.c |8 3 files changed, 10 insertions(+) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index e6c2fd5

Re: [Cluster-devel] [PATCH] gfs2: skip dlm_unlock calls in unmount

2012-11-12 Thread David Teigland
On Mon, Nov 12, 2012 at 10:44:36AM +, Steven Whitehouse wrote: - save 64 bytes of memory for every local lock (32 in gfs2_glock, 32 in dlm_rsb) - save 96 bytes of memory for every remote lock (32 in gfs2_glock, 32 in local dlm_rsb, 32 in remote dlm_lkb) - save 32 bytes of

Re: [Cluster-devel] [PATCH] gfs2: skip dlm_unlock calls in unmount

2012-11-09 Thread David Teigland
On Fri, Nov 09, 2012 at 09:45:17AM +, Steven Whitehouse wrote: + if (test_bit(SDF_SKIP_DLM_UNLOCK, sdp-sd_flags) + (!gl-gl_lvb[0] || gl-gl_state != LM_ST_EXCLUSIVE)) { I'm still not happy with using !gl-gl_lvb[0] to determine whether the LVB is in use or not. I think we need a

Re: [Cluster-devel] gfs2: skip dlm_unlock calls in unmount

2012-11-08 Thread David Teigland
On Thu, Nov 08, 2012 at 10:26:53AM +, Steven Whitehouse wrote: Hi, On Wed, 2012-11-07 at 14:14 -0500, David Teigland wrote: When unmounting, gfs2 does a full dlm_unlock operation on every cached lock. This can create a very large amount of work and can take a long time to complete

Re: [Cluster-devel] gfs2: skip dlm_unlock calls in unmount

2012-11-08 Thread David Teigland
On Thu, Nov 08, 2012 at 06:48:19PM +, Steven Whitehouse wrote: Converting to NL would actually be less expensive than unlock because the NL convert does not involve a reply message, but unlock does. I'm not entirely sure I follow... at least from the filesystem point of view (and

[Cluster-devel] [PATCH] gfs2: skip dlm_unlock calls in unmount

2012-11-08 Thread David Teigland
is called because it may update the lvb of the resource. Signed-off-by: David Teigland teigl...@redhat.com --- fs/gfs2/glock.c|1 + fs/gfs2/incore.h |1 + fs/gfs2/lock_dlm.c |8 3 files changed, 10 insertions(+) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index e6c2fd5

[Cluster-devel] gfs2: skip dlm_unlock calls in unmount

2012-11-07 Thread David Teigland
is called because it may update the lvb of the resource. Signed-off-by: David Teigland teigl...@redhat.com --- fs/gfs2/glock.c|1 + fs/gfs2/incore.h |1 + fs/gfs2/lock_dlm.c |6 ++ 3 files changed, 8 insertions(+) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index e6c2fd5..f3a5edb

Re: [Cluster-devel] cluster4 dlm dlm_stonith ??? should it really fence by turning node off?

2012-11-05 Thread David Teigland
On Sat, Nov 03, 2012 at 03:58:28PM +0100, Jacek Konieczny wrote: Hello, The dlm_stonith fencing helper is really convenient when Pacemaker is in use. Though, it doesn't quite work as I would expect ??? when fencing is needed it requests a node to be turned off instead of rebooting. And it

Re: [Cluster-devel] cluster4 dlm: startup notification for systemd

2012-11-05 Thread David Teigland
On Sat, Nov 03, 2012 at 04:27:54PM +0100, Jacek Konieczny wrote: Hello, The two patches: [PATCH 1/2] --foreground option added to dlm_controld [PATCH 2/2] Startup notification by sd_notify() add startup notification for the systemd service unit. This way startup of services

Re: [Cluster-devel] [PATCH] dlm_stonith_{off, reboot} aliases for fence helper

2012-11-05 Thread David Teigland
On Mon, Nov 05, 2012 at 07:05:22PM +0100, Jacek Konieczny wrote: - rv = stonith_api_kick_helper(nodeid, 300, 1); + rv = stonith_api_kick_helper(nodeid, 300, turn_off); I'd like it to be reboot, but seeing the arg as bool off I figured the opposite would be on ... if you're saying that

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread David Teigland
On Wed, Oct 03, 2012 at 09:25:08AM +, Dietmar Maurer wrote: So the observed behavior is expected? Yes, it's a stateful partition merge, and I think /var/log/messages should have mentioned something about that. When a node is partitioned from the others (e.g. network disconnected), it has

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread David Teigland
On Wed, Oct 03, 2012 at 04:12:10PM +, Dietmar Maurer wrote: Yes, it's a stateful partition merge, and I think /var/log/messages should have mentioned something about that. When a node is partitioned from the others (e.g. network disconnected), it has to be cleanly reset before it's

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread David Teigland
On Wed, Oct 03, 2012 at 04:26:35PM +, Dietmar Maurer wrote: I guess you're talking about the dlm_tool ls output? Yes. The fencing there means it is waiting for fenced to finish fencing before it starts dlm recovery. fenced waits for quorum. So who actually starts fencing

Re: [Cluster-devel] fence daemon problems

2012-10-03 Thread David Teigland
On Wed, Oct 03, 2012 at 04:55:55PM +, Dietmar Maurer wrote: The difficult cases, which I think you're seeing, are partitions where no group has quorum, e.g. 2/2. In this case we do nothing, and the user has to resolve it by resetting some of the nodes The problem with that is that

  1   2   3   4   >