On Thu, Nov 08, 2012 at 10:26:53AM +, Steven Whitehouse wrote:
> Hi,
>
> On Wed, 2012-11-07 at 14:14 -0500, David Teigland wrote:
> > When unmounting, gfs2 does a full dlm_unlock operation on every
> > cached lock. This can create a very large amount of work and can
&g
ck is called because it may update the
lvb of the resource.
Signed-off-by: David Teigland
---
fs/gfs2/glock.c|1 +
fs/gfs2/incore.h |1 +
fs/gfs2/lock_dlm.c |6 ++
3 files changed, 8 insertions(+)
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index e6c2fd5..f3a5edb 100644
---
On Mon, Nov 05, 2012 at 07:05:22PM +0100, Jacek Konieczny wrote:
> - rv = stonith_api_kick_helper(nodeid, 300, 1);
> + rv = stonith_api_kick_helper(nodeid, 300, turn_off);
I'd like it to be "reboot", but seeing the arg as "bool off" I figured the
opposite would be "on" ... if you're saying
On Sat, Nov 03, 2012 at 04:27:54PM +0100, Jacek Konieczny wrote:
> Hello,
>
> The two patches:
>
>[PATCH 1/2] --foreground option added to dlm_controld
>[PATCH 2/2] Startup notification by sd_notify()
>
> add startup notification for the systemd service unit. This way startup
> of servic
On Sat, Nov 03, 2012 at 03:58:28PM +0100, Jacek Konieczny wrote:
> Hello,
>
> The dlm_stonith fencing helper is really convenient when Pacemaker is in
> use. Though, it doesn't quite work as I would expect ??? when fencing
> is needed it requests a node to be turned off instead of rebooting. And
On Wed, Oct 03, 2012 at 04:55:55PM +, Dietmar Maurer wrote:
> > The difficult cases, which I think you're seeing, are partitions where
> > no group has quorum, e.g. 2/2. In this case we do nothing, and the
> > user has to resolve it by resetting some of the nodes
>
> The problem with that is
On Wed, Oct 03, 2012 at 04:26:35PM +, Dietmar Maurer wrote:
> > I guess you're talking about the dlm_tool ls output?
>
> Yes.
>
> > The "fencing" there
> > means it is waiting for fenced to finish fencing before it starts dlm
> > recovery.
> > fenced waits for quorum.
>
> So who actually s
On Wed, Oct 03, 2012 at 04:12:10PM +, Dietmar Maurer wrote:
> > Yes, it's a stateful partition merge, and I think /var/log/messages should
> > have
> > mentioned something about that. When a node is partitioned from the
> > others (e.g. network disconnected), it has to be cleanly reset before
On Wed, Oct 03, 2012 at 09:25:08AM +, Dietmar Maurer wrote:
> So the observed behavior is expected?
Yes, it's a stateful partition merge, and I think /var/log/messages should
have mentioned something about that. When a node is partitioned from the
others (e.g. network disconnected), it has t
On Sun, Sep 09, 2012 at 04:16:58PM +0200, Sasha Levin wrote:
> device_write only checks whether the request size is big enough, but it
> doesn't
> check if the size is too big.
>
> At that point, it also tries to allocate as much memory as the user has
> requested
> even if it's too much. This c
On Mon, Aug 27, 2012 at 01:43:22PM +0200, Heiko Nardmann wrote:
> Hi together!
>
> During the shutdown of my second cluster node (two node cluster) I
> have seen a process 'dlm_recoverd' running with 100% CPU usage for
> about 6 minutes.
>
> It's just that I have no idea what is the task of this
On Wed, Jul 25, 2012 at 07:32:28AM +0200, Fabio M. Di Nitto wrote:
> From: "Fabio M. Di Nitto"
>
> Resolves: rhbz#842370
looks good, thanks
> +# DLM_LKBTBL_SIZE - DLM_RSBTBL_SIZE - DLM_DIRTBL_SIZE
> +# Allow tuning of DLM kernel hash table sizes.
> +# do NOT change unless instructed to do so.
On Mon, May 21, 2012 at 05:35:26PM +0300, Dan Carpenter wrote:
> Smatch complains that we unlock this twice. It looks like an accidental
> to me.
Thanks, will fix that.
On Tue, May 15, 2012 at 11:58:12AM +0300, Dan Carpenter wrote:
> We aren't allowed to pass NULL pointers to kmem_cache_destroy() so if
> both allocations fail, it leads to a NULL dereference.
thanks, added that to next branch.
On Fri, May 04, 2012 at 11:33:17AM -0600, dann frazier wrote:
> On Fri, Mar 30, 2012 at 11:17:56AM -0600, dann frazier wrote:
> > On Fri, Mar 30, 2012 at 12:42:40PM -0400, David Teigland wrote:
> > > On Fri, Mar 30, 2012 at 11:42:56AM -0400, David Teigland wrote:
> >
On Tue, Apr 10, 2012 at 10:12:28AM +0100, Steven Whitehouse wrote:
> Hi,
>
> On Thu, 2012-04-05 at 12:11 -0400, Bob Peterson wrote:
> > Hi,
> >
> > Here's another patch (explanation below). This patch replies upon
> > a DLM patch that hasn't fully gone upstream yet, so perhaps it
> > shouldn't be
On Fri, Mar 30, 2012 at 11:42:56AM -0400, David Teigland wrote:
> Hi Dan, I'm not very familiar with this code either, but I've talked with
> Chrissie and she suggested we try something like this:
A second version that addresses a potentially similar problem in start.
dif
On Wed, Mar 21, 2012 at 07:59:13PM -0600, dann frazier wrote:
> However... we've dropped the connections_lock, so its possible that a
> new connection gets created on line 9. This connection structure would
> have pointers to the workqueues that we're about to destroy. Sometime
> later on we get da
On Fri, Mar 23, 2012 at 01:06:05PM -0700, Randy Dunlap wrote:
> >> GFS2_FS selects DLM (if GFS2_FS_LOCKING_DLM, which is enabled).
> >> GFS2_FS selects IP_SCTP if DLM_SCTP, which is not enabled and not
> >> used anywhere else in the kernel tree AFAICT.
> >> DLM just always selects IP_SCTP.
> >
> >
> on i386:
>
> ERROR: "sctp_do_peeloff" [fs/dlm/dlm.ko] undefined!
>
>
> GFS2_FS selects DLM (if GFS2_FS_LOCKING_DLM, which is enabled).
> GFS2_FS selects IP_SCTP if DLM_SCTP, which is not enabled and not
> used anywhere else in the kernel tree AFAICT.
> DLM just always selects IP_SCTP.
Here's wh
On Wed, Mar 21, 2012 at 12:24:35PM +0300, Dan Carpenter wrote:
> In fs/dlm/lowcomms.c we declare the dlm_local_addr[] array like
> this:
> static struct sockaddr_storage *dlm_local_addr[DLM_MAX_ADDR_COUNT];
>
> But it looks like the last element of the array is never used:
>
> 1072 /* Get loca
On Thu, Feb 16, 2012 at 02:55:21PM +0100, Danny Kukawka wrote:
> fs/dlm/rcom.c included 'member.h' twice, remove the duplicate.
I'll fold this into the current patch I'm working on.
>
> Signed-off-by: Danny Kukawka
> ---
> fs/dlm/rcom.c |1 -
> 1 files changed, 0 insertions(+), 1 deletions
ink
the current merge cycle would be good, but you can send it off whenever
you think is right.
Dave
>From 0fb2d7726b570c6a5eb289bac237fb384b9c6f0b Mon Sep 17 00:00:00 2001
From: David Teigland
Date: Tue, 20 Dec 2011 17:03:04 -0600
Subject: [PATCH] gfs2: dlm based recovery coordination
If the first mounter fails to recover one of the journals
during mount, the mount should fail.
Signed-off-by: David Teigland
---
fs/gfs2/incore.h |1 +
fs/gfs2/recovery.c |3 ++-
2 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index
only mount on a read only
block device.
Signed-off-by: David Teigland
---
fs/gfs2/incore.h |1 +
fs/gfs2/ops_fstype.c |2 +-
fs/gfs2/recovery.c |4 +++-
3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 9182a87..59114c5 10064
On Mon, Jan 09, 2012 at 11:46:26AM -0500, David Teigland wrote:
> On Mon, Jan 09, 2012 at 04:36:30PM +, Steven Whitehouse wrote:
> > On Thu, 2012-01-05 at 10:46 -0600, David Teigland wrote:
> > > This new method of managing recovery is an alternative to
> > > the pre
On Mon, Jan 09, 2012 at 04:36:30PM +, Steven Whitehouse wrote:
> On Thu, 2012-01-05 at 10:46 -0600, David Teigland wrote:
> > This new method of managing recovery is an alternative to
> > the previous approach of using the userland gfs_controld.
> >
> > - use
On Thu, Jan 05, 2012 at 04:58:22PM +, Steven Whitehouse wrote:
> > + clear_bit(SDF_NOJOURNALID, &sdp->sd_flags);
> > + smp_mb__after_clear_bit();
> > + wake_up_bit(&sdp->sd_flags, SDF_NOJOURNALID);
> > + ls->ls_first = !!test_bit(DFL_FIRST_MOUNT, &ls->ls_recover_flags);
> > + return 0
their
kernel counterparts. These callbacks allow the same
coordination directly, and more simply.
Signed-off-by: David Teigland
---
fs/dlm/config.c | 130 ++--
fs/dlm/config.h | 17 +++-
fs/dlm/dlm_internal.h | 21 ++
fs/dlm/lockspace.c| 43
ck to track journals that need recovery
Signed-off-by: David Teigland
---
fs/gfs2/glock.c |2 +-
fs/gfs2/glock.h |7 +-
fs/gfs2/incore.h| 58 +++-
fs/gfs2/lock_dlm.c | 993 ++-
fs/gfs2/m
From: Bob Peterson
Change the linked lists to rb_tree's in the rsb
hash table to speed up searches. Slow rsb searches
were having a large impact on gfs2 performance due
to the large number of dlm locks gfs2 uses.
Signed-off-by: Bob Peterson
Signed-off-by: David Teigland
---
f
userland. This new feature is not used by current dlm_controld and
gfs_controld daemons, but will be enabled by a new dlm_controld version
under development.
Bob Peterson (1):
dlm: convert rsb list to rb_tree
David Teigland (4):
dlm: move recovery barrier calls
dlm: add node slots
slot number.
A new generation number is also added to a lockspace. It is
set and incremented during each recovery along with the slot
collection/assignment.
The slot numbers will be passed to gfs2 which will use them as
journal id's.
Signed-off-by: David Teigland
---
fs/dlm/dlm_internal.h |
Put all the calls to recovery barriers in the same function
to clarify where they each happen. Should not change any behavior.
Also modify some recovery debug lines to make them consistent.
Signed-off-by: David Teigland
---
fs/dlm/dir.c |1 -
fs/dlm/member.c |7 +--
fs/dlm
On Thu, Jan 05, 2012 at 03:40:09PM +, Steven Whitehouse wrote:
> I think it would be a good plan to not send this last patch for the
> current merge window and let it settle for a bit longer. Running things
> so fine with the timing makes me nervous bearing in mind the number of
> changes,
To
t; | - use dlm recovery callbacks to initiate journal recovery
> | - use a dlm lock to determine the first node to mount fs
> | - use a dlm lock to track journals that need recovery
> |
> | Signed-off-by: David Teigland
> | ---
> | --- a/fs/gfs2/lock_dlm.c
> | +++ b/fs/gfs2/lock
> [patch] dlm: le32 vs le16
> gfs2: make some sizes unsigned in set_recover_size()
Thanks, I've folded in both of those.
Dave
On Mon, Dec 19, 2011 at 12:47:38PM -0500, David Teigland wrote:
> On Mon, Dec 19, 2011 at 01:07:38PM +, Steven Whitehouse wrote:
> > > struct lm_lockstruct {
> > > int ls_jid;
> > > unsigned int ls_first;
> > > - unsigned int ls_first_done;
&
On Wed, Dec 21, 2011 at 10:45:21AM +, Steven Whitehouse wrote:
> I don't think I understand whats going on in that case. What I thought
> should be happening was this:
>
> - Try to get mounter lock in EX
>- If successful, then we are the first mounter so recover all
> journals
>-
On Tue, Dec 20, 2011 at 02:16:43PM -0500, David Teigland wrote:
> On Tue, Dec 20, 2011 at 10:39:08AM +, Steven Whitehouse wrote:
> > > I dislike arbitrary delays also, so I'm hesitant to add them.
> > > The choices here are:
> > > - removing NOQUEUE from t
On Tue, Dec 20, 2011 at 10:39:08AM +, Steven Whitehouse wrote:
> > I dislike arbitrary delays also, so I'm hesitant to add them.
> > The choices here are:
> > - removing NOQUEUE from the requests below, but with NOQUEUE you have a
> > much better chance of killing a mount command, which is a
On Mon, Dec 19, 2011 at 12:36:57PM +, Steven Whitehouse wrote:
> > + struct dlm_lockspace_ops ls_ops;
> ^^ I'd suggest just keeping a pointer to
> this, see below.
> > +static int new_lockspace(const char *name, const char *cluster, uint32_t
> > flags,
> > +
> Nit, but this should have some spaces, iow, "i + 1;"
> -error = check_config(ls, rc, nodeid);
> +error = check_rcom_config(ls, rc, nodeid);
yeah, I'll change those, thanks
On Mon, Dec 19, 2011 at 01:07:38PM +, Steven Whitehouse wrote:
> > struct lm_lockstruct {
> > int ls_jid;
> > unsigned int ls_first;
> > - unsigned int ls_first_done;
> > unsigned int ls_nodir;
> Since ls_flags and ls_first also also only boolean flags, they could
> potentially b
ck to track journals that need recovery
Signed-off-by: David Teigland
---
fs/gfs2/glock.c |2 +-
fs/gfs2/glock.h |7 +-
fs/gfs2/incore.h| 51 ++-
fs/gfs2/lock_dlm.c | 979 ++-
fs/gfs2/m
their
kernel counterparts. These callbacks allow the same
coordination directly, and more simply.
Signed-off-by: David Teigland
---
fs/dlm/config.c | 130 ++--
fs/dlm/config.h | 17 +++-
fs/dlm/dlm_internal.h | 20 ++
fs/dlm/lockspace.c| 37
Put all the calls to recovery barriers in the same function
to clarify where they each happen. Should not change any behavior.
Also modify some recovery debug lines to make them consistent.
Signed-off-by: David Teigland
---
fs/dlm/dir.c |1 -
fs/dlm/member.c |7 +--
fs/dlm
From: Bob Peterson
Change the linked lists to rb_tree's in the rsb
hash table to speed up searches. Slow rsb searches
were having a large impact on gfs2 performance due
to the large number of dlm locks gfs2 uses.
Signed-off-by: Bob Peterson
Signed-off-by: David Teigland
---
f
slot number.
A new generation number is also added to a lockspace. It is
set and incremented during each recovery along with the slot
collection/assignment.
The slot numbers will be passed to gfs2 which will use them as
journal id's.
Signed-off-by: David Teigland
---
fs/dlm/dlm_internal.h |
This is the current series of dlm patches from
https://github.com/teigland/linux-dlm/tree/devel9
The first is already pushed to linux-next for the next merge cycle.
The others, which allow gfs2 to be used without gfs_controld, are
still being tested, and may be ready for the next merge cycle,
depe
On Fri, Nov 04, 2011 at 04:57:31PM +, Steven Whitehouse wrote:
> Hi,
>
> On Fri, 2011-11-04 at 12:31 -0400, David Teigland wrote:
> > On Fri, Nov 04, 2011 at 03:19:49PM +, Steven Whitehouse wrote:
> > > The three pairs of mean/variance measure the following
>
On Fri, Nov 04, 2011 at 03:19:49PM +, Steven Whitehouse wrote:
> The three pairs of mean/variance measure the following
> things:
>
> 1. DLM lock time (non-blocking requests)
You don't need to track and save this value, because all results will be
one of three values which can gather once:
Hi Bob, I've made a few minor/cosmetic changes and attached my current
version (not tested yet).
> static int shrink_bucket(struct dlm_ls *ls, int b)
> {
> + struct rb_node *n = NULL;
> struct dlm_rsb *r;
> int count = 0, found;
>
> for (;;) {
> found = 0;
>
On Thu, Oct 13, 2011 at 05:16:29PM +0100, Steven Whitehouse wrote:
> Hi,
>
> On Thu, 2011-10-13 at 11:30 -0400, David Teigland wrote:
> > On Thu, Oct 13, 2011 at 03:41:31PM +0100, Steven Whitehouse wrote:
> > > > cluster4
> > > > . jid from dlm-kernel &
On Thu, Oct 13, 2011 at 03:41:31PM +0100, Steven Whitehouse wrote:
> > cluster4
> > . jid from dlm-kernel "slots" which will be assigned similarly
> What is the actual algorithm used to assign these slots?
The same as picking jids: lowest unused id starting with 0. As for
implementation, I'll add
On Fri, Oct 14, 2011 at 12:02:27AM +0900, Masatake YAMATO wrote:
> Just a question.
> I'm happy if you give me a hint.
>
> > ...
> > cluster3 dlm/gfs recovery
> > . dlm_controld sees nodedown (libcpg)
> > . gfs_controld sees nodedown (libcpg)
> > . dlm_con
Here's the outline of my plan to remove/replace the essential bits of
gfs_controld in cluster4. I expect it'll go away entirely, but there
could be one or two minor things it would still handle on the side.
kernel dlm/gfs2 will continue to be operable with either
. cluster3 dlm_controld/gfs_contr
On Mon, Oct 10, 2011 at 08:00:07PM +0100, Steven Whitehouse wrote:
> > The fact remains that caching "as much as possible" tends to be harmful,
> > and some careful limiting would be a good investment.
> >
> There is a limit. The point is that the limit is dynamic and depends on
> memory pressure.
On Mon, Oct 10, 2011 at 04:51:01PM +0100, Steven Whitehouse wrote:
> Hi,
>
> On Mon, 2011-10-10 at 10:43 -0400, David Teigland wrote:
> > On Sat, Oct 08, 2011 at 06:13:52AM -0400, Bob Peterson wrote:
> > > - Original Message -
> > > | On Wed, Oct 05, 2011
On Mon, Oct 10, 2011 at 10:45:17AM +0200, Fabio M. Di Nitto wrote:
> This is the first patchset to address some issues spotted by Coverity scan.
look fine
On Sat, Oct 08, 2011 at 06:13:52AM -0400, Bob Peterson wrote:
> - Original Message -
> | On Wed, Oct 05, 2011 at 03:25:39PM -0400, Bob Peterson wrote:
> | > Hi,
> | >
> | > This upstream patch changes the way DLM keeps track of RSBs.
> | > Before, they were in a linked list off a hash tabl
On Thu, Oct 06, 2011 at 08:02:10PM +0200, Fabio M. Di Nitto wrote:
> Hi David,
>
> this is going to need another quick pass.
>
> The libdlm headers are fine, but for the daemon/tool, we had GPLv2+ in
> STABLE31 and current header only reflects GPLv2.
I'm defaulting to plain v2 unless there's a r
On Wed, Oct 05, 2011 at 03:25:39PM -0400, Bob Peterson wrote:
> Hi,
>
> This upstream patch changes the way DLM keeps track of RSBs.
> Before, they were in a linked list off a hash table. Now,
> they're an rb_tree off the same hash table. This speeds up
> DLM lookups greatly.
>
> Today's DLM is
> dlm/libdlm/libdlm.pc.in | 11 -
> dlm/libdlm/libdlm_lt.pc.in| 11 -
>
> dropping the .pc file is going to break dlm users.
>
> pc files are used by different build systems (not just
> autotools/autoconf) to detect libdlm and link against it correctly.
>
> Similar
On Fri, Sep 30, 2011 at 01:07:01PM +0200, Fabio M. Di Nitto wrote:
> On 09/30/2011 12:02 AM, David Teigland wrote:
>
> > add a normal, sane Makefile
>
> If you plan to drop autoconf+autotool, that is your call (I disagree for
> several reasons, but it's your project a
the incarnation numbers of members
from consecutive queries to avoid this.
bz 663397
Signed-off-by: David Teigland
---
group/gfs_controld/member_cman.c | 51 +++---
1 files changed, 47 insertions(+), 4 deletions(-)
diff --git a/group/gfs_controld/member_cman.c
the incarnation numbers of members
from consecutive queries to avoid this.
bz 663397
Signed-off-by: David Teigland
---
fence/fenced/member_cman.c | 36
1 files changed, 36 insertions(+), 0 deletions(-)
diff --git a/fence/fenced/member_cman.c b/fence
the incarnation numbers of members
from consecutive queries to avoid this.
bz 663397
Signed-off-by: David Teigland
---
group/dlm_controld/member_cman.c | 79 --
1 files changed, 75 insertions(+), 4 deletions(-)
diff --git a/group/dlm_controld/member_cman.c
On Sat, Sep 24, 2011 at 07:13:34AM +0200, Fabio M. Di Nitto wrote:
> Quick question.. deadlock.c/netlink.c have been dropped from the build
> and not referenced anywhere for distribution. Is it a plan to kill them
> completely or do they need porting?
I'm going to leave the files there as artifact
> >> When node A starts back up, the SCTP protocol notices this (as it?s
> >> supposed to), and delivers an SCTP_ASSOC_CHANGE / SCTP_RESTART
> >> notification to the SCTP socket, telling the socket owner (the dlm_recv
> >> thread) that the other node has restarted. DLM responds by telling SCTP
> >
On Tue, Sep 06, 2011 at 01:00:16PM +0100, Andrew Price wrote:
> This patch removes an if statement where the true branch is never taken.
> At this point in the code, poll_timeout could only be 500 or -1.
>
> Signed-off-by: Andrew Price
> ---
> group/gfs_controld/main.c |3 ---
> 1 files chan
. This is because the kernel
generates extraneous plock unlock requests
when files are closed with flocks. Because
dlm_controld finds no plocks on the files,
it replies to the kernel with an error, rather
than skipping the reply to do CLOSE.
bz 731775
Signed-off-by: David Teigland
---
group/dl
On Sun, Jul 10, 2011 at 10:54:31PM +0200, Jesper Juhl wrote:
> In fs/dlm/lock.c in the dlm_scan_waiters() function there are 3 small
> issues:
>
> 1) There's no need to test the return value of the allocation and do a
> memset if is succeedes. Just use kzalloc() to obtain zeroed memory.
>
> 2) Si
On Wed, Jul 06, 2011 at 12:14:26PM -0400, David Teigland wrote:
> Request for testing
>
> I'm looking at possible improvements to the dlm hash tables.
I've pushed this and another patch related to hash table performance to
the tmp-testing branch,
git://git.kernel.org/pub/s
Request for testing
I'm looking at possible improvements to the dlm hash tables. This patch
keeps lkbs in an idr instead of a hash table. Before pushing this patch
further, I'd like to know if it makes any difference in environments using
millions of locks on each node.
From: Davi
On Wed, Jun 29, 2011 at 11:51:00PM +0200, Jesper Juhl wrote:
> > I don't think so; num_nodes won't be set to zero.
>
> Hmm. How so? Maybe I'm missing something obvious, but;
> num_nodes is initialized to zero at the beginning of the function, which
> means that we'll definately do the first allo
On Wed, Jun 29, 2011 at 11:09:27PM +0200, Jesper Juhl wrote:
> In fs/dlm/lock.c in the dlm_scan_waiters() function there are 3 small
> issues:
>
> 1) first time through the loop we allocate memory for 'warned', if we
> then (in the loop) don't take the "if (!warned)" path and loop again,
> the sec
On Fri, May 27, 2011 at 07:44:03AM +0800, Jiaju Zhang wrote:
> This series introduces a RB tree for improving plock resources searching
> efficiency. We met this performance issue when running Samba on top of
> cluster filesystem, profiling during nbench runs with num-progs=500, the
> dlm_controld
> > Cc: Christine Caulfield
> > Cc: David Teigland
> > Cc: cluster-devel@redhat.com
> > Signed-off-by: Michal Marek
> > ---
> > fs/dlm/main.c |2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
>
> Hi,
>
> I don't see this
when the
process is killed. So the unlock-close also needs to clear
any waiting plocks that were abandoned by the killed process.
The corresponding kernel patch:
https://lkml.org/lkml/2011/5/23/237
Signed-off-by: David Teigland
---
group/dlm_controld/plock.c | 28
1
On Thu, Mar 24, 2011 at 01:56:47PM +, Matt Fleming wrote:
> From: Matt Fleming
>
> recalc_sigpending() is called within sigprocmask(), so there is no
> need call it again after sigprocmask() has returned.
Thanks, pushed to dlm.git next.
Dave
On Sat, Mar 19, 2011 at 07:34:55AM +0100, Fabio M. Di Nitto wrote:
> My suggestion would be to allow to specify a list of ports instead.
This comes up now and then. The current rule of one action per agent
execution is a tried and true, fundamental property of the agent api.
It should not be chan
ded for the node, causing dlm_controld to wait indefinately for
fencing to complete for the reduced victim.
The fix is to simply record the information from a victim_done message
even if the node is not in the victims list.
bz 678704
Signed-off-by: David Teigland
---
fence/fenced/cpg.c | 18 ++
On Tue, Feb 22, 2011 at 04:35:42PM +0800, Jiaju Zhang wrote:
> On Tue, Nov 9, 2010 at 6:06 AM, David Teigland wrote:
> > On Mon, Nov 08, 2010 at 11:05:49PM +0800, Jiaju Zhang wrote:
> >> Luckily, things have changed now. One user met this issue two months
> >> ago a
On Thu, Feb 03, 2011 at 01:26:07PM -0600, Ryan O'Hara wrote:
> This patch adds the ability to send a dbus signal when a node is fenced.
> This code is can reestablish a connection with dbus if necessary.
ACK
> +void dbus_init (void)
No space before (
Also, it would be a good idea to put a fenced-specific prefix before
fenced's own dbus functions, e.g. fd_dbus_init(), because dbus_ is the
dbus lib's namespace and open to symbol collisions.
> +{
> +#ifdef DBUS
> +
> +if (!(bus = dbus_bus_get_priva
On Tue, Jan 04, 2011 at 06:06:51PM -0200, cmaiol...@redhat.com wrote:
> The resource groups got corrupted without this patch:
I could see an extraneous bast leading to confusion in gfs2 about the lock
state, but gfs2 should probably be asserting somewhere before it actually
corrupts anything...
>
On Tue, Dec 14, 2010 at 12:28:25AM +0900, Namhyung Kim wrote:
> The create_workqueue() returns NULL if failed rather than ERR_PTR().
> Fix error checking and remove unnecessary variable 'error'.
I adapted this to the alloc_workqueue patch in next and pushed to next.
Dave
On Wed, Dec 01, 2010 at 10:23:25AM +0100, Menyhart Zoltan wrote:
> If we cannot obtain a given resource within a limited time frame,
> then it is a real error for the customer: s/he cannot mount an OCFS2
> volume, cannot issue a cluster command, etc.
Matter of opinion and preference I suppose.
>
On Tue, Nov 30, 2010 at 05:57:50PM +0100, Menyhart Zoltan wrote:
> Hi,
>
> An easy first step to make DLM more robust can be adding a time out protection
> to the lock space cration operation, while waiting for a "dlm_controld"
> action.
> A new memeber "ci_dlm_controld_secs" is added to "dlm_con
On Wed, Nov 24, 2010 at 05:13:40PM +0100, Menyhart Zoltan wrote:
> Could you please indicate the exact URL?
The current fedora packages,
or
https://www.redhat.com/archives/cluster-devel/2010-October/msg8.html
or
http://git.fedorahosted.org/git/?p=cluster.git;a=shortlog;h=refs/heads/STABLE31
>
On Tue, Nov 23, 2010 at 03:58:42PM +0100, Menyhart Zoltan wrote:
> David Teigland wrote:
> >On Mon, Nov 22, 2010 at 05:31:25PM +0100, Menyhart Zoltan wrote:
> >>We have got a two-node OCFS2 file system controlled by the pacemaker.
> >
> >Are you using dlm_contro
On Mon, Nov 22, 2010 at 05:31:25PM +0100, Menyhart Zoltan wrote:
> We have got a two-node OCFS2 file system controlled by the pacemaker.
Are you using dlm_controld.pcmk? If so, please try the latest versions of
pacemaker that use the standard dlm_controld. The problem may be related
to the locks
On Fri, Nov 12, 2010 at 04:20:35PM +, Steven Whitehouse wrote:
> Hi,
>
> On Fri, 2010-11-12 at 11:12 -0500, David Teigland wrote:
> > On Fri, Nov 12, 2010 at 12:12:29PM +, Steven Whitehouse wrote:
> > >
> > > So far as I can tell, there is no reason to
On Wed, Nov 10, 2010 at 09:56:39PM -0800, David Miller wrote:
>
> In the normal regime where an application uses non-blocking I/O
> writes on a socket, they will handle -EAGAIN and use poll() to
> wait for send space.
>
> They don't actually sleep on the socket I/O write.
>
> But kernel level RP
On Fri, Nov 12, 2010 at 12:12:29PM +, Steven Whitehouse wrote:
>
> So far as I can tell, there is no reason to use a single-threaded
> send workqueue for dlm, since it may need to send to several sockets
> concurrently. Both workqueues are set to WQ_MEM_RECLAIM to avoid
> any possible deadlock
On Mon, Nov 08, 2010 at 11:05:49PM +0800, Jiaju Zhang wrote:
> Luckily, things have changed now. One user met this issue two months
> ago and he's also very kindly to test the patch. The result is the
> patch really works.
>
> Attached is the log before they apply the patch. This time the log
> ha
also speeds up GFS as well. In the GFS2 case the performance gain
is over 10x for cases of write activity to an inode whose glock is cached
on another, idle (wrt that glock) node.
(comment added, dct)
Signed-off-by: Steven Whitehouse
Tested-by: Abhijith Das
Signed-off-by: David Teigland
---
fs/dl
On Mon, Jul 26, 2010 at 05:19:19PM +0800, Changli Gao wrote:
> Signed-off-by: Changli Gao
>
> fs/dlm/netlink.c | 15 +--
> 1 file changed, 1 insertion(+), 14 deletions(-)
> diff --git a/fs/dlm/netlink.c b/fs/dlm/netlink.c
> index 2c6ad51..ef17e01 100644
> --- a/fs/dlm/netlink.c
On Tue, Jun 08, 2010 at 09:34:52AM +0100, Steven Whitehouse wrote:
> > A couple obvious questions from the start...
> > - What if gfs_controld isn't running?
> It will hang until mount is killed, where upon it will clean up and exit
> gracefully.
Right, so instead of failing with an error, it hang
101 - 200 of 465 matches
Mail list logo