Re: [Cluster-devel] [PATCH 2/2] MAINTAINERS: Update dlm mailing list

2023-09-05 Thread David Teigland
AINTAINERS b/MAINTAINERS > index caae31fb9741..946fcf6c8d77 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -6093,7 +6093,7 @@ F: include/video/udlfb.h > DISTRIBUTED LOCK MANAGER (DLM) > M: Christine Caulfield > M: David Teigland > -L: cluster-devel@redhat.co

[Cluster-devel] [GIT PULL] dlm updates for 6.5

2023-06-29 Thread David Teigland
Hi Linus, Please pull dlm updates from tag: git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-6.5 The dlm posix lock handling (for gfs2) has three notable changes: - Local pids returned from GETLK are no longer negated. A previous patch negating remote pids mistakenly

[Cluster-devel] [GIT PULL] dlm updates for 6.4

2023-04-25 Thread David Teigland
Hi Linus, Please pull dlm updates from tag: git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-6.4 Change summary: Remove some unused features (related to lock timeouts) that have been previously scheduled for removal. Fix a bug where the pending callback flag would be in

[Cluster-devel] [GIT PULL] dlm updates for 6.3

2023-02-20 Thread David Teigland
Hi Linus, Please pull dlm updates from tag: git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-6.3 This patch set fixes some races in the lowcomms startup and shutdown code that were found by targetted stress testing that quickly and repeatedly joins and leaves lockspaces.

[Cluster-devel] [GIT PULL] dlm updates for 6.2

2022-12-12 Thread David Teigland
Hi Linus, Please pull dlm updates from tag: git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-6.2 These patches include the usual cleanups and minor fixes, removal of code that is no longer needed due to recent improvements, and improvements to processing large volumes of

Re: [Cluster-devel] [PATCH v2][next] dlm: Replace one-element array with flexible-array member

2022-10-11 Thread David Teigland
On Mon, Oct 10, 2022 at 03:35:24PM -0700, Kees Cook wrote: > On Mon, Oct 10, 2022 at 04:00:39PM -0500, David Teigland wrote: > > On Sat, Oct 08, 2022 at 09:03:28PM -0700, Kees Cook wrote: > > > On Sun, Oct 09, 2022 at 03:05:17PM +1300, Paulo Miguel Almeida wrote: > > > &

Re: [Cluster-devel] [PATCH v2][next] dlm: Replace one-element array with flexible-array member

2022-10-10 Thread David Teigland
On Sat, Oct 08, 2022 at 09:03:28PM -0700, Kees Cook wrote: > On Sun, Oct 09, 2022 at 03:05:17PM +1300, Paulo Miguel Almeida wrote: > > On Sat, Oct 08, 2022 at 05:18:35PM -0700, Kees Cook wrote: > > > This is allocating 1 more byte than before, since the struct size didn't > > > change. But this ha

[Cluster-devel] [GIT PULL] dlm updates for 6.1

2022-10-03 Thread David Teigland
Hi Linus, Please pull dlm updates from tag: git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-6.1 This set of commits includes: . Fix a couple races found with a new torture test. . Improve errors when api functions are used incorrectly. . Improve tracing for lock requests

Re: [Cluster-devel] [GIT PULL] dlm updates for 6.0

2022-08-01 Thread David Teigland
On Mon, Aug 01, 2022 at 09:17:30AM -0700, Linus Torvalds wrote: > But again: please don't rebase stuff you have already exposed to > others. It causes real issues. This was just one example of it. > > And if you *do* have to rebase for a real technical reason ("Oh, that > was a disaster, it absolu

Re: [Cluster-devel] [GIT PULL] dlm updates for 6.0

2022-08-01 Thread David Teigland
On Mon, Aug 01, 2022 at 10:50:28AM -0500, David Teigland wrote: > On Mon, Aug 01, 2022 at 08:46:24AM -0700, Linus Torvalds wrote: > > On Mon, Aug 1, 2022 at 7:43 AM David Teigland wrote: > > > > > > (You can ignore the premature 5.20 pull request from some weeks ago.)

Re: [Cluster-devel] [GIT PULL] dlm updates for 6.0

2022-08-01 Thread David Teigland
On Mon, Aug 01, 2022 at 08:46:24AM -0700, Linus Torvalds wrote: > On Mon, Aug 1, 2022 at 7:43 AM David Teigland wrote: > > > > (You can ignore the premature 5.20 pull request from some weeks ago.) > > Gaah. That was the first thing I pulled this morning because it was >

[Cluster-devel] [GIT PULL] dlm updates for 6.0

2022-08-01 Thread David Teigland
Hi Linus, Please pull dlm updates from tag: git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-6.0 Changes in this set of commits: . Delay the cleanup of interrupted posix lock requests until the user space result arrives. Previously, the immediate cleanup would lead t

Re: [Cluster-devel] [PATCH dlm/next 1/4] fs: dlm: replace sanity checks with WARN_ON

2022-02-17 Thread David Teigland
On Thu, Feb 17, 2022 at 01:36:44AM +0100, Andreas Gruenbacher wrote: > On Wed, Feb 16, 2022 at 5:16 PM Alexander Aring wrote: > > > > - spin_lock(&ops_lock); > > > > - if (!list_empty(&op->list)) { > > > > - log_error(ls, "dlm_posix_lock: op on list %llx", > > > > -

Re: [Cluster-devel] [RFC PATCH dlm 00/10] dlm_controld config settings can be

2021-09-22 Thread David Teigland
On Wed, Sep 22, 2021 at 05:32:49PM +0800, heming.z...@suse.com wrote: > If there is no chance to add dynamic updating setting by run command. > Is it a good idea to add a parameter "-I", like "dlm_tool -I reload_config". > "-I" means directly change without reading from dlm.conf. > When users want

Re: [Cluster-devel] [RFC PATCH dlm 00/10] dlm_controld config settings can be

2021-09-21 Thread David Teigland
On Tue, Sep 21, 2021 at 02:38:45PM +0800, heming.z...@suse.com wrote: > But I am ok with the reload_config idea, it's more basic. > We could give dlm_controld a chance to change behavior on the fly. > If needed, I could file a new patch for feature "reload_config", can I do it? Yes, I'd welcome a

Re: [Cluster-devel] [RFC PATCH dlm 00/10] dlm_controld config settings can be

2021-09-20 Thread David Teigland
On Sun, Sep 19, 2021 at 02:43:12PM +0800, Heming Zhao wrote: > This new feature gives dlm ability to change config settings dynamically. Hi Heming, Letting dlm_controld reload certain settings from dlm.conf makes sense, but I'd like something more basic. Let the user edit dlm.conf, then run dlm_

Re: [Cluster-devel] [BUG] fs: dlm: possible ABBA deadlock

2021-08-19 Thread David Teigland
On Thu, Aug 19, 2021 at 04:54:57PM +0800, Jia-Ju Bai wrote: > Hello, > > My static analysis tool reports a possible ABBA deadlock in the dlm > filesystem in Linux 5.10: > > dlm_recover_waiters_pre() >   mutex_lock(&ls->ls_waiters_mutex); --> line 5130 >   recover_convert_waiter() >     _receive_c

Re: [Cluster-devel] Why does dlm_lock function fails when downconvert a dlm lock?

2021-08-16 Thread David Teigland
On Mon, Aug 16, 2021 at 09:41:18AM -0500, David Teigland wrote: > On Fri, Aug 13, 2021 at 02:49:04PM +0800, Gang He wrote: > > Hi David, > > > > On 2021/8/13 1:45, David Teigland wrote: > > > On Thu, Aug 12, 2021 at 01:44:53PM +0800, Gang He wrote: > > >

Re: [Cluster-devel] Why does dlm_lock function fails when downconvert a dlm lock?

2021-08-16 Thread David Teigland
On Fri, Aug 13, 2021 at 02:49:04PM +0800, Gang He wrote: > Hi David, > > On 2021/8/13 1:45, David Teigland wrote: > > On Thu, Aug 12, 2021 at 01:44:53PM +0800, Gang He wrote: > > > In fact, I can reproduce this problem stably. > > > I want to know if this error ha

Re: [Cluster-devel] Why does dlm_lock function fails when downconvert a dlm lock?

2021-08-12 Thread David Teigland
On Thu, Aug 12, 2021 at 01:44:53PM +0800, Gang He wrote: > In fact, I can reproduce this problem stably. > I want to know if this error happen is by our expectation? since there is > not any extreme pressure test. > Second, how should we handle these error cases? call dlm_lock function > again? may

Re: [Cluster-devel] [PATCH -next] fs: dlm: fix missing unlock on error in accept_from_sock()

2021-03-29 Thread David Teigland
On Sat, Mar 27, 2021 at 04:37:04PM +0800, Yang Yingliang wrote: > Add the missing unlock before return from accept_from_sock() > in the error handling case. Thanks, applied to the next branch. Dave > Fixes: 6cde210a9758 ("fs: dlm: add helper for init connection") > Reported-by: Hulk Robot > Sign

Re: [Cluster-devel] [PATCH 1/4] sctp: add sctp_sock_set_nodelay

2020-05-29 Thread David Teigland
On Fri, May 29, 2020 at 02:09:40PM +0200, Christoph Hellwig wrote: > Add a helper to directly set the SCTP_NODELAY sockopt from kernel space > without going through a fake uaccess. Ack, they look fine to me, thanks. Dave

Re: [Cluster-devel] is it ok to always pull in sctp for dlm, was: Re: [PATCH 27/33] sctp: export sctp_setsockopt_bindx

2020-05-14 Thread David Teigland
On Thu, May 14, 2020 at 12:40:40PM +0200, Christoph Hellwig wrote: > On Wed, May 13, 2020 at 03:00:58PM -0300, Marcelo Ricardo Leitner wrote: > > On Wed, May 13, 2020 at 08:26:42AM +0200, Christoph Hellwig wrote: > > > And call it directly from dlm instead of going through kernel_setsockopt. > > >

Re: [Cluster-devel] [PATCH] dlm: no need to check return value of debugfs_create functions

2019-06-12 Thread David Teigland
On Wed, Jun 12, 2019 at 05:25:36PM +0200, Greg Kroah-Hartman wrote: > When calling debugfs functions, there is no need to ever check the > return value. The function can work or not, but the code logic should > never do something different based on this. Thanks, pushed to next branch in linux-dlm

Re: [Cluster-devel] [PATCH v2 1/3] dlm: check if workqueues are NULL before flushing/destroying

2019-04-05 Thread David Teigland
I tried these with one address and had no problem, so I've pushed them to the linux-dlm next branch. Dave

Re: [Cluster-devel] [GFS2 PATCH] gfs2: Panic when an io error occurs writing to the journal

2018-12-17 Thread David Teigland
On Mon, Dec 17, 2018 at 09:58:47AM -0500, Bob Peterson wrote: > Dave Teigland recommended. Unless I'm mistaken, Dave has said that GFS2 > should never withdraw; it should always just kernel panic (Dave, correct > me if I'm wrong). At least this patch confines that behavior to a small > subset of wi

Re: [Cluster-devel] lost idr_destroy for ls_recover_idr in release_lockspace() ?

2018-11-15 Thread David Teigland
On Thu, Nov 15, 2018 at 09:49:17AM +0300, Vasily Averin wrote: > Dear David, > I've noticed that release_lockspace() lacks idr_destroy(&ls->ls_recover_idr), > though it is called on rollback in new_lockspace(). > > It seems for me it is not critical, and should not lead to any leaks, > however cou

Re: [Cluster-devel] [PATCH 0/3] dlm: fix various incorrect behaviors

2018-11-07 Thread David Teigland
On Fri, Nov 02, 2018 at 02:18:19PM -0600, Tycho Andersen wrote: > Hi, > > here's a series to fix some bugs I noticed in the DLM. The third patch > in the series and maybe the first should probably go to stable, assuming > everyone agrees they're indeed bugs. > > Thanks, > > Tycho > > Tycho Ande

Re: [Cluster-devel] How to enable daemon_debug for dlm_controld

2018-06-27 Thread David Teigland
On Wed, Jun 27, 2018 at 04:07:18PM +0800, Guoqing Jiang wrote: > But by default, seems dlm_controld just run with "-s 0". And I tried to add > "daemon_debug=1" to /etc/dlm/dlm.conf, > then dlm resource can't start at all. Could you tell me how to enable this > option? Thanks in advance! That optio

Re: [Cluster-devel] [PATCH] dlm: prompt the user SCTP is experimental

2018-04-02 Thread David Teigland
On Thu, Mar 22, 2018 at 10:27:56PM -0600, Gang He wrote: > Hello David, > > Do you agree to add this prompt to the user? > Since sometimes customers attempted to setup SCTP protocol with two rings, > but they could not get the expected result, then it maybe bring some concerns > to the customer

Re: [Cluster-devel] [ClusterLabs] DLM connection channel switch take too long time (> 5mins)

2018-03-08 Thread David Teigland
> I use active rrp_mode in corosync.conf and reboot the cluster to let the > configuration effective. > But, the about 5 mins hang in new_lockspace() function is still here. The last time I tested connection failures with sctp was several years ago, but I recall seeing similar problems. I had ho

Re: [Cluster-devel] [BUG] fs/dlm: A possible sleep-in-atomic bug in dlm_master_lookup

2017-10-09 Thread David Teigland
On Sat, Oct 07, 2017 at 03:26:11AM +0100, Al Viro wrote: > On Sat, Oct 07, 2017 at 09:59:41AM +0800, Jia-Ju Bai wrote: > > According to fs/dlm/lock.c, the kernel may sleep under a spinlock, > > and the function call path is: > > dlm_master_lookup (acquire the spinlock) > > dlm_send_rcom_lookup_du

Re: [Cluster-devel] [PATCH] dlm/recoverd: recheck kthread_should_stop() before schedule()

2017-09-25 Thread David Teigland
On Mon, Sep 25, 2017 at 03:47:50PM +0800, Guoqing Jiang wrote: > Call schedule() here could make the thread miss wake > up from kthread_stop(), so it is better to recheck > kthread_should_stop() before call schedule(), a symptom > happened when I run indefinite test (which mostly created > clustere

Re: [Cluster-devel] [PATCH 00/18] [try #2] DLM: dlm patches need review

2017-09-18 Thread David Teigland
The patches are now here for testing https://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git/log/?h=next Dave

Re: [Cluster-devel] [PATCH 00/18] [try #2] DLM: dlm patches need review

2017-09-13 Thread David Teigland
On Wed, Sep 13, 2017 at 10:51:06AM +0100, Steven Whitehouse wrote: > Hi, > > > On 12/09/17 09:54, tsutomu@toshiba.co.jp wrote: > > Hi, > > > > This series of patches (2nd version after previous review on August) is to > > fix various bugs. This patch set is against the mainline kernel. > >

Re: [Cluster-devel] [PATCH 13/18] [try #2] DLM: fix conversion deadlock when DLM_LKF_NODLCKWT flag is set

2017-09-12 Thread David Teigland
On Tue, Sep 12, 2017 at 09:01:31AM +, tsutomu@toshiba.co.jp wrote: > When the DLM_LKF_NODLCKWT flag was set, even if conversion deadlock > was detected, the caller of can_be_granted() was unknown. > We change the behavior of can_be_granted() and change it to detect > conversion deadlock reg

Re: [Cluster-devel] [PATCH 10/17] dlm: use schedule_timeout instead of schedule in dlm_recoverd

2017-08-22 Thread David Teigland
On Thu, Aug 17, 2017 at 11:40:13PM +, tsutomu@toshiba.co.jp wrote: > If you refer to other implementations in kernel, the following > modifications may be better. > The important thing is to call kthread_should_stop() after > set_current_state(TASK_INTERRUPTIBLE). How is this fix? Thanks,

Re: [Cluster-devel] [PATCH 13/17] dlm: fix _can_be_granted() for lock at the head of covert queue.

2017-08-09 Thread David Teigland
On Wed, Aug 09, 2017 at 11:41:44AM -0500, David Teigland wrote: > On Wed, Aug 09, 2017 at 05:51:37AM +, tsutomu@toshiba.co.jp wrote: > > If there is a lock resource conflict on multiple nodes, the lock on > > convert queue may not be granted forever. > > >

Re: [Cluster-devel] [PATCH 13/17] dlm: fix _can_be_granted() for lock at the head of covert queue.

2017-08-09 Thread David Teigland
On Wed, Aug 09, 2017 at 05:51:37AM +, tsutomu@toshiba.co.jp wrote: > If there is a lock resource conflict on multiple nodes, the lock on > convert queue may not be granted forever. > > EX.) > grant queue: > node0 grmode NL / rqmode IV > node1 grmode NL / rqmode IV > > convert queu

Re: [Cluster-devel] [PATCH 08/17] dlm: retry rcom when dlm_wait_function is timed out.

2017-08-09 Thread David Teigland
On Wed, Aug 09, 2017 at 05:50:31AM +, tsutomu@toshiba.co.jp wrote: > If a node sends a DLM_RCOM_STATUS command and an error occurs on the > receiving side, the DLM_RCOM_STATUS_REPLY response may not be returned. > We retransmitted the DLM_RCOM_STATUS command so that we do not wait for > an

Re: [Cluster-devel] [PATCH 10/17] dlm: use schedule_timeout instead of schedule in dlm_recoverd

2017-08-09 Thread David Teigland
On Wed, Aug 09, 2017 at 05:51:01AM +, tsutomu@toshiba.co.jp wrote: > When dlm_recoverd_stop() is called between kthread_should_stop() and > set_task_state(), dlm_recoverd will not wake up. This works, but have you looked elsewhere in the kernel for kthread examples we could copy that do a

Re: [Cluster-devel] [PATCH] dlm: use sock_create_lite inside tcp_accept_from_sock

2017-08-07 Thread David Teigland
On Mon, Aug 07, 2017 at 02:31:20PM +0800, Guoqing Jiang wrote: > To resolve the issue, we need to use sock_create_lite > instead of sock_create_kern, like commit 0933a578cd55 > ("rds: tcp: use sock_create_lite() to create the accept > socket") did. Thanks, this is now in linux-dlm next. Dave

Re: [Cluster-devel] DLM: Do not count redundant connection attempts against retries

2017-04-20 Thread David Teigland
On Thu, Apr 20, 2017 at 04:02:20PM -0400, Bob Peterson wrote: > Hi, > > Before this patch, multiple GFS2 mounts would result in multiple > connection attempts. They were all ignored, and rightly so, but > they were being counted against the connection attempt retries. > This patch moves the retry

Re: [Cluster-devel] [PATCH v2] fs/dlm: Fix kernel memory disclosure

2017-02-22 Thread David Teigland
On Wed, Feb 22, 2017 at 03:45:34PM +0800, Vlad Tsyrklevich wrote: > Hello, I wanted to ping the list and see if this could get a review: now pushed to linux-dlm.git > > Clear the 'unused' field and the uninitialized padding in 'lksb' to > > avoid leaking memory to userland in copy_result_to_user(

Re: [Cluster-devel] Question on LVB when the node that held EX lock crash

2016-11-30 Thread David Teigland
On Wed, Nov 30, 2016 at 05:07:22PM +0800, Eric Ren wrote: > @@ -852,12 +868,19 @@ void dlm_recover_rsbs(struct dlm_ls *ls) > if (is_master(r)) { > if (rsb_flag(r, RSB_RECOVER_CONVERT)) > recover_conversion(r); > + > +

Re: [Cluster-devel] Question on LVB when the node that held EX lock crash

2016-11-16 Thread David Teigland
On Wed, Nov 16, 2016 at 04:42:09PM +0800, Eric Ren wrote: > On 11/16/2016 04:29 PM, Eric Ren wrote: > > Hi David and all, > > > > I am debugging an issue of ocfs2 that relates to LVB value. I will try > > to make it a pure DLM question: > > > > Two nodes (N1, N2) try to truncate the same file(R1)

Re: [Cluster-devel] [DLM PATCH] DLM: Don't specify WQ_UNBOUND for the ast callback workqueue

2016-10-19 Thread David Teigland
On Wed, Oct 19, 2016 at 11:34:54AM -0400, Bob Peterson wrote: > Hi, > > This patch removes the WQ_UNBOUND flag (which implies WQ_HIGHPRI) > from the DLM's ast work queue, in favor of just WQ_HIGHPRI. > This has been shown to cause a 19 percent performance increase for > simultaneous inode creates

Re: [Cluster-devel] About dlm_unlock (kernel space)

2016-06-13 Thread David Teigland
On Mon, Jun 13, 2016 at 07:15:09AM -0400, Guoqing Jiang wrote: > Hi, > > In case we have set DLM_LKF_CONVERT flag for dlm_lock, is it > possible that the convert > queue could be NULL or not NULL while perform unlock? I think there > are two different > cases would appear when call dlm_unlock: >

Re: [Cluster-devel] [DLM PATCH] dlm_controld: outputs explicit info about stateful

2016-05-18 Thread David Teigland
On Wed, May 18, 2016 at 02:53:00PM +0800, Eric Ren wrote: > Q1: what's stateful merged node? > Q2: what if we add the stateful merged nodes to dlm_controld daemon > cpg instead of fencing them? The details here are fundamental to the way dlm works because the dlm depends on the properties of Virt

Re: [Cluster-devel] [DLM PATCH] dlm_controld: add option of enable_force_kick

2016-05-16 Thread David Teigland
On Mon, May 16, 2016 at 04:07:18PM +0800, Eric Ren wrote: > When there are 3 or more partitions that merge, none may see enough > clean nodes. Therefore, DLM would be stuck there forever unitl administrator > manually reset/restart enough nodes to produce sufficient clean nodes. > However, sometime

Re: [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection

2016-05-16 Thread David Teigland
On Mon, May 16, 2016 at 03:44:27PM +0800, Eric Ren wrote: > Thanks! Hum, according to the long comments, you've handled the 2/2 > even split by way of the low nodeid killing statefull merged > numbers. Interesting, I'd forgotten about that bit of code, so I was wrong to say that we do nothing afte

Re: [Cluster-devel] inconsistent dlm_new_lockspace LVB_LEN size from ocfs2 user-space tool and ocfs2 kernel module

2016-05-13 Thread David Teigland
On Fri, May 13, 2016 at 02:36:25AM -0600, Gang He wrote: > Here is a inconsistent LVB_LEN size problem when create a new lockspace > from user-space tool (e.g. fsck.ocfs2) and kernel module (e.g. > ocfs2/stack_user.c). > From the userspace tool, the LVB size is DLM_USER_LVB_LEN (32 bytes, > define

Re: [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection

2016-05-13 Thread David Teigland
On Fri, May 13, 2016 at 01:45:47PM +0800, Eric Ren wrote: > >the cluster. Neither option is good. In the past we decided to let the > >cluster sit in this state so an admin could choose which nodes to remove. > >Do you prefer the alternative of kicking nodes in this case (with somewhat > >unpredi

Re: [Cluster-devel] [DLM PATCH] dlm_controld: handle the case of network transient disconnection

2016-05-12 Thread David Teigland
On Thu, May 12, 2016 at 05:16:08PM +0800, Eric Ren wrote: > DLM would be stuck in "need fencing" state, although cluster can > regain quorum very quickly after a network transient disconnection. > > It's possible that this process happens within one monoclock. It > means "cluster_quorate_monotime"

Re: [Cluster-devel] [DLM PATCH 0/6] Misc DLM Improvements Regarding Socket Errors

2016-02-15 Thread David Teigland
On Mon, Feb 15, 2016 at 04:16:17PM -0500, Bob Peterson wrote: > I think the "right thing to do" at this point is this: > > 1. Patch #1 is already upstream > 2. Patch #2 stands on its own, so I think this should go forward. > 3. Combine patches 3, 4 and 5, which ought to provide a comprehensive fix

Re: [Cluster-devel] [DLM PATCH 0/6] Misc DLM Improvements Regarding Socket Errors

2016-02-11 Thread David Teigland
On Thu, Feb 11, 2016 at 01:39:09PM -0500, Bob Peterson wrote: > The problem is: While testing the dlm in multiple recovery situations, > Nate and I discovered multiple problems. Until recently, no one has tried > to run recovery tests on an upstream DLM, (Let's distinguish tcp connection testing/r

Re: [Cluster-devel] [DLM PATCH 0/6] Misc DLM Improvements Regarding Socket Errors

2016-02-11 Thread David Teigland
ms introduced here: From b3a5bbfd780d9e9291f5f257be06e9ad6db11657 Mon Sep 17 00:00:00 2001 From: Bob Peterson Date: Thu, 27 Aug 2015 09:34:47 -0500 Subject: [PATCH] dlm: print error from kernel_sendpage Print a dlm-specific error when a socket error occurs when sending a dlm message. Signed-off-by: Bob Peterson Si

Re: [Cluster-devel] DLM Shutdown

2016-02-10 Thread David Teigland
On Wed, Feb 10, 2016 at 09:38:58PM +0100, Andreas Gruenbacher wrote: > On Wed, Feb 10, 2016 at 9:18 PM, David Teigland wrote: > > On Wed, Feb 10, 2016 at 08:48:12PM +0100, Andreas Gruenbacher wrote: > >> When a shutdown is requested, shouldn't dlm_controld really relea

Re: [Cluster-devel] DLM Shutdown

2016-02-10 Thread David Teigland
On Wed, Feb 10, 2016 at 08:48:12PM +0100, Andreas Gruenbacher wrote: > When a shutdown is requested, shouldn't dlm_controld really release > lockspaces in a similar way as well? You could probably do that if you check that the lockspace is managing no local locks (which would be a pain). If locks

Re: [Cluster-devel] DLM Shutdown

2016-02-10 Thread David Teigland
On Wed, Feb 10, 2016 at 02:33:49AM +0100, Andreas Gruenbacher wrote: > never actively releases existing lockspaces. This means that as soon > as any application creates the default lockspace (via libdlm), or if > an application doesn't release any lockspaces it creates, dlm_controld > will never sh

Re: [Cluster-devel] [GFS2 PATCH 1/2] GFS2: Make gfs2_clear_inode() queue the final put

2015-12-04 Thread David Teigland
On Fri, Dec 04, 2015 at 09:51:53AM -0500, Bob Peterson wrote: > it's from the fenced process, and if so, queue the final put. That should > mitigate the problem. Bob, I'm perplexed by the focus on fencing; this issue is broader than fencing as I mentioned in bz 1255872. Over the years that I've r

Re: [Cluster-devel] problem about dlm posix file lock (sorry for missing subjuct)

2015-10-13 Thread David Teigland
On Tue, Oct 13, 2015 at 04:30:53AM -0600, Zhen Ren wrote: > It expects alarm timeout to send SIGALRM, and wake up the sleep process, > as "man fcntl" says: "If a signal is caught while waiting, then > the call is interrupted and (after the signal handler has returned) > returns immediately (wi

Re: [Cluster-devel] [PATCH 17/23] dlm: use per-attribute show and store methods

2015-09-28 Thread David Teigland
On Fri, Sep 25, 2015 at 06:49:54AM -0700, Christoph Hellwig wrote: > Signed-off-by: Christoph Hellwig > --- > fs/dlm/config.c | 288 > +++- > 1 file changed, 74 insertions(+), 214 deletions(-) Looks good to me. Dave

Re: [Cluster-devel] PROBLEM: dlm: BUG_ON on "con->nodeid == 0" when connect from unknown address

2015-08-03 Thread David Teigland
On Mon, Aug 03, 2015 at 07:20:55PM +0800, tan...@zte.com.cn wrote: > When using SCTP protocol in dlm and it received connecting request > from unknown address, the function receive_from_sock may directly > shutdown the connection through process_sctp_notification. If still > messages received fr

Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-11 Thread David Teigland
On Thu, Jun 11, 2015 at 05:47:28PM +0800, Guoqing Jiang wrote: > Do you consider take the following clean up? If yes, I will send a > formal patch, otherwise pls ignore it. On first glance, the old and new code do not appear to do the same thing, so let's leave it as it is. > - to_nodeid =

Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-10 Thread David Teigland
On Wed, Jun 10, 2015 at 11:10:44AM +0800, Guoqing Jiang wrote: > The remove_from_waiters could only be invoked after failed to > create_message, right? > Since send_message always returns 0, this patch doesn't touch anything > about the failure > path, and it also doesn't change the original seman

Re: [Cluster-devel] Kernel crash at DLM: kernel BUG at /usr/src/packages/BUILD/dlm-1.6.fio/obj/default/lowcomms.c:715!

2015-01-29 Thread David Teigland
On Thu, Jan 29, 2015 at 03:50:58AM +, Pralay Dakua wrote: > 645 static int receive_from_sock(struct connection *con) > 646 { > > > 704 > 705 /* Process SCTP notifications */ > 706 if (msg.msg_flags & MSG_NOTIFICATION) { > 707 msg.msg_control = incmsg; >

Re: [Cluster-devel] [RFA][PATCH 5/8] dlm: Remove seq_printf() return checks and use seq_has_overflowed()

2014-11-04 Thread David Teigland
On Tue, Nov 04, 2014 at 08:08:52AM -0500, Steven Rostedt wrote: > On Wed, 29 Oct 2014 17:56:07 -0400 > Steven Rostedt wrote: > > > From: Joe Perches > > > > [ REQUEST FOR ACKS ] > > Can any of the DLM maintainers give me an Acked-by for this? Looks ok, Dave

Re: [Cluster-devel] [DLM PATCH] DLM: Don't wait for resource library lookups if NOLOOKUP is specified

2014-10-01 Thread David Teigland
On Wed, Oct 01, 2014 at 01:21:41PM -0400, Bob Peterson wrote: > Hi, > > This patch adds a new lock flag, DLM_LKF_NOLOOKUP, which instructs DLM > to refrain from sending lookup requests in cases where the lock library > node is not the current node. This is similar to the DLM_LKF_NOQUEUE > flag, ex

Re: [Cluster-devel] [PATCH 9/9] fs: dlm: lockd: Convert int result to unsigned char type

2014-07-23 Thread David Teigland
On Wed, Jul 23, 2014 at 02:11:39PM -0400, Jeff Layton wrote: > On Sun, 20 Jul 2014 11:23:43 -0700 > Joe Perches wrote: > > > op->info.rv is an s32, but it's only used as a u8. > > > > I don't understand this patch. info.rv is s32 (and I assume that "rv" > stands for "return value"). What I don'

Re: [Cluster-devel] [RFC PATCH] dlm: Remove unused conf from lm_grant

2014-07-01 Thread David Teigland
On Tue, Jul 01, 2014 at 01:16:32PM -0400, Bob Peterson wrote: > - Original Message - > > On Tue, Jul 01, 2014 at 10:43:13AM -0400, Jeff Layton wrote: > > > On Tue, 01 Jul 2014 06:20:10 -0700 > > > Joe Perches wrote: > > > > > > > While doing a bit of adding argument names to fs.h, > > > >

Re: [Cluster-devel] [RFC PATCH] dlm: Remove unused conf from lm_grant

2014-07-01 Thread David Teigland
On Tue, Jul 01, 2014 at 10:43:13AM -0400, Jeff Layton wrote: > On Tue, 01 Jul 2014 06:20:10 -0700 > Joe Perches wrote: > > > While doing a bit of adding argument names to fs.h, > > I looked at lm_grant and it seems the 2nd argument > > is always NULL. > > > > How about removing it? > > > > This

Re: [Cluster-devel] [PATCH] dlm_controld: fix name printing error in logging

2014-04-25 Thread David Teigland
On Fri, Apr 25, 2014 at 03:21:48PM +0800, Lidong Zhong wrote: > When the length of name_in is NAME_ID_SIZE, the last byte of the name > and a whitespace will get lost. Thanks, I modified your patch to handle longer names also... commit 4283123f0b13eafc46d825050c5142cf44be79c3 Author: Lidong Zhong

Re: [Cluster-devel] CMAN/DLM without SCTP

2014-02-26 Thread David Teigland
On Wed, Feb 26, 2014 at 04:52:14PM +0530, Pratik Mehta wrote: > On Wed, Feb 19, 2014 at 11:43 PM, David Teigland > wrote: > > > > > > That's a fine solution. You might also be able to use > > 'service cman start quorum'. > > Apart from DLM,

Re: [Cluster-devel] CMAN/DLM without SCTP

2014-02-19 Thread David Teigland
On Tue, Feb 18, 2014 at 07:03:44PM +0530, Pratik Mehta wrote: > Hi, > I am trying to use a cluster with Pacemaker + CMAN on CentOS 6.4. The > application that runs on the cluster includes a userspace SCTP stack. > However CMAN loads dlm which loads the Linux kernel sctp module, which > interferes w

Re: [Cluster-devel] [PATCH] dlm: Avoid that dlm_release_lockspace() incorrectly returns -EBUSY

2013-10-16 Thread David Teigland
On Wed, Oct 16, 2013 at 02:20:25PM +0200, Bart Van Assche wrote: > When dlm_release_lockspace(ls, 1) is invoked on a busy system I've pushed this to the next branch. Thanks, Dave

Re: [Cluster-devel] [patch] dlm: some checks can underflow

2013-07-31 Thread David Teigland
On Wed, Jul 31, 2013 at 12:02:29PM +0300, Dan Carpenter wrote: > This is a static checker fix. We have several places here that check > the upper limit without checking for negative numbers. One example of > this is in find_rsb(). > > My static checker marks endian data as user controled so. Th

Re: [Cluster-devel] [PATCH] dlm: Avoid LVB truncation

2013-06-26 Thread David Teigland
On Wed, Jun 26, 2013 at 02:27:57PM +0200, Bart Van Assche wrote: > For lockspaces with an LVB length above 64 bytes, avoid truncating > the LVB while exchanging it with another node in the cluster. Thanks, I've added this to next. Dave

Re: [Cluster-devel] [PATCH 0/6] dlm: sctp use fixes

2013-06-14 Thread David Teigland
On Fri, Jun 14, 2013 at 04:56:08AM -0500, micha...@cs.wisc.edu wrote: > The following patches made over Linus's tree fix a handful of bugs > that occur when the initial IP addr cannot be reached when using > SCTP. Thanks Mike, I've pushed these to the linux-dlm next branch. Dave

[Cluster-devel] [PATCH] gfs2: add native setup to man page

2013-05-14 Thread David Teigland
List the simplest sequence of steps to manually set up and run gfs2/dlm. Signed-off-by: David Teigland --- gfs2/man/gfs2.5 | 188 1 file changed, 188 insertions(+) diff --git a/gfs2/man/gfs2.5 b/gfs2/man/gfs2.5 index 25effdd..220a10d

[Cluster-devel] [PATCH] gfs2: add native setup to man page

2013-05-13 Thread David Teigland
List the simplest sequence of steps to manually set up and run gfs2/dlm. Signed-off-by: David Teigland --- gfs2/man/gfs2.5 | 188 1 file changed, 188 insertions(+) diff --git a/gfs2/man/gfs2.5 b/gfs2/man/gfs2.5 index 25effdd..eb12934

Re: [Cluster-devel] linux-next: Tree for May 8 (dlm)

2013-05-09 Thread David Teigland
On Thu, May 09, 2013 at 09:47:45AM +1000, Stephen Rothwell wrote: > [Just forwarding to David ...] > > On Wed, 08 May 2013 11:04:45 -0700 Randy Dunlap wrote: > > > > on x86_64: > > > > when CONFIG_GFS2_FS_LOCKING_DLM=y and CONFIG_DLM=m: > > > > fs/built-in.o: In function `gfs2_lock': > > file.c

Re: [Cluster-devel] [PATCH] dlm_tool: Trimming garbages at in Expecting reply output

2013-05-02 Thread David Teigland
On Thu, May 02, 2013 at 09:19:21PM +0900, Masatake YAMATO wrote: > The buffer used in "Expecting reply" of dlm_tool lockdebug output is > used as C string (via printf %s) but not terminated with nul char. Yes, thanks. This was fixed for some time in dlm.git. I'm afraid we'll need to go through t

Re: [Cluster-devel] GFS2: Pull request (fixes)

2013-04-05 Thread David Teigland
On Fri, Apr 05, 2013 at 11:34:45AM +0100, Steven Whitehouse wrote: > Please consider pulling the following changes, There's some mixup here that should be cleared up first. > David Teigland (2): > GFS2: Fix unlock of fcntl locks during withdrawn state > >

[Cluster-devel] [PATCH] gfs2: use kmalloc for lvb bitmap

2013-03-05 Thread David Teigland
The temp lvb bitmap was on the stack, which could be an alignment problem for __set_bit_le. Use kmalloc for it instead. Signed-off-by: David Teigland --- fs/gfs2/incore.h | 1 + fs/gfs2/lock_dlm.c | 31 ++- 2 files changed, 19 insertions(+), 13 deletions(-) diff

Re: [Cluster-devel] DLM regression in 64-bit 3.7.x Kernel?

2013-02-25 Thread David Teigland
On Tue, Feb 19, 2013 at 11:55:14AM +0100, Jacek Konieczny wrote: > Hi, > > I have recently upgraded my development cluster from 3.6.x to 3.7.1 > kernel and clvmd stopped working (all locking operation result in 'Invalid > argument'). I have traced the problem to this call: > > write(8, > "\6\0\0

Re: [Cluster-devel] [PATCH] idr: fix a subtle bug in idr_get_next()

2013-02-05 Thread David Teigland
; > id += slot_distance. > > > > Signed-off-by: Tejun Heo > > Reported-by: David Teigland > > Cc: KAMEZAWA Hiroyuki > > David, can you please test whether the patch makes the skipped > deletion bug go away? Yes, I've tested, and it works fine now. Thanks, Dave

Re: [Cluster-devel] [PATCH 10/14] dlm: don't use idr_remove_all()

2013-02-01 Thread David Teigland
On Thu, Jan 31, 2013 at 04:18:41PM -0800, Tejun Heo wrote: > It looks a bit weird to me that ls->ls_recover_list_count is also > incremented by recover_list_add(). The two code paths don't seem to > be interlocke at least upon my very shallow glance. Is it that only > either the list or idr is in

Re: [Cluster-devel] [PATCH 10/14] dlm: don't use idr_remove_all()

2013-01-30 Thread David Teigland
On Tue, Jan 29, 2013 at 10:13:17AM -0500, David Teigland wrote: > On Mon, Jan 28, 2013 at 10:57:23AM -0500, David Teigland wrote: > > On Fri, Jan 25, 2013 at 05:31:08PM -0800, Tejun Heo wrote: > > > idr_destroy() can destroy idr by itself and idr_remove_all() is bein

Re: [Cluster-devel] [PATCH 10/14] dlm: don't use idr_remove_all()

2013-01-29 Thread David Teigland
On Mon, Jan 28, 2013 at 10:57:23AM -0500, David Teigland wrote: > On Fri, Jan 25, 2013 at 05:31:08PM -0800, Tejun Heo wrote: > > idr_destroy() can destroy idr by itself and idr_remove_all() is being > > deprecated. > > > > The conversion isn't completely tr

Re: [Cluster-devel] [PATCH 10/14] dlm: don't use idr_remove_all()

2013-01-28 Thread David Teigland
e use of > idr_remove_all() w/o idr_destroy(). Replace it with idr_remove() call > inside idr_for_each_entry() loop. It goes on top so that it matches > the operation order in recover_idr_del(). > > Only compile tested. > > Signed-off-by: Tejun Heo > Cc: Christine Cau

Re: [Cluster-devel] [PATCH 09/14] dlm: use idr_for_each_entry() in recover_idr_clear() error path

2013-01-28 Thread David Teigland
emove_all(). > > Only compile tested. > > Signed-off-by: Tejun Heo > Cc: Christine Caulfield > Cc: David Teigland > Cc: cluster-devel@redhat.com > --- > This patch depends on an earlier idr patch and I think it would be > best to route these together through -mm. Christine, David, can you > please ack this? Ack

[Cluster-devel] gfs2: fix skip unlock condition

2013-01-03 Thread David Teigland
The recent commit fb6791d100d1bba20b5cdbc4912e1f7086ec60f8 included the wrong logic. The lvbptr check was incorrectly added after the patch was tested. Signed-off-by: David Teigland --- fs/gfs2/lock_dlm.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/gfs2

[Cluster-devel] [PATCH 1/2] gfs2: only use lvb on glocks that need it

2012-11-14 Thread David Teigland
Save the effort of allocating, reading and writing the lvb for most glocks that do not use it. Signed-off-by: David Teigland --- fs/gfs2/glock.c| 27 +-- fs/gfs2/glops.c|3 ++- fs/gfs2/incore.h |3 ++- fs/gfs2/lock_dlm.c | 12 +++- 4 files

[Cluster-devel] [PATCH 2/2] gfs2: remove redundant lvb pointer

2012-11-14 Thread David Teigland
The lksb struct already contains a pointer to the lvb, so another directly from the glock struct is not needed. Signed-off-by: David Teigland --- fs/gfs2/glock.c| 10 -- fs/gfs2/incore.h |1 - fs/gfs2/lock_dlm.c |8 fs/gfs2/quota.c|6 +++--- fs/gfs2

[Cluster-devel] [PATCH] gfs2: skip dlm_unlock calls in unmount

2012-11-13 Thread David Teigland
ck is called because it may update the lvb of the resource. Signed-off-by: David Teigland --- fs/gfs2/glock.c|1 + fs/gfs2/incore.h |1 + fs/gfs2/lock_dlm.c |8 3 files changed, 10 insertions(+) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index e6c2fd5..f3a5edb 100644 ---

Re: [Cluster-devel] [PATCH] gfs2: skip dlm_unlock calls in unmount

2012-11-12 Thread David Teigland
On Mon, Nov 12, 2012 at 10:44:36AM +, Steven Whitehouse wrote: > > - save 64 bytes of memory for every local lock > > (32 in gfs2_glock, 32 in dlm_rsb) > > > > - save 96 bytes of memory for every remote lock > > (32 in gfs2_glock, 32 in local dlm_rsb, 32 in remote dlm_lkb) > > > > - save

Re: [Cluster-devel] [PATCH] gfs2: skip dlm_unlock calls in unmount

2012-11-09 Thread David Teigland
On Fri, Nov 09, 2012 at 09:45:17AM +, Steven Whitehouse wrote: > > + if (test_bit(SDF_SKIP_DLM_UNLOCK, &sdp->sd_flags) && > > + (!gl->gl_lvb[0] || gl->gl_state != LM_ST_EXCLUSIVE)) { > I'm still not happy with using !gl->gl_lvb[0] to determine whether the > LVB is in use or not. I think

[Cluster-devel] [PATCH] gfs2: skip dlm_unlock calls in unmount

2012-11-08 Thread David Teigland
ck is called because it may update the lvb of the resource. Signed-off-by: David Teigland --- fs/gfs2/glock.c|1 + fs/gfs2/incore.h |1 + fs/gfs2/lock_dlm.c |8 3 files changed, 10 insertions(+) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index e6c2fd5..f3a5edb 100644 ---

Re: [Cluster-devel] gfs2: skip dlm_unlock calls in unmount

2012-11-08 Thread David Teigland
On Thu, Nov 08, 2012 at 06:48:19PM +, Steven Whitehouse wrote: > > Converting to NL would actually be less expensive than unlock because the > > NL convert does not involve a reply message, but unlock does. > > > I'm not entirely sure I follow... at least from the filesystem point of > view (a

  1   2   3   4   5   >