from:"David Teigland"

Re: [Cluster-devel] [PATCH] for a header-file-cannot-found building error

2007-08-20 Thread David Teigland

On Sun, Aug 19, 2007 at 07:32:57AM +0200, Fabio Massimo Di Nitto wrote:
 David Teigland wrote:
 
  
  I believe that the correct solution is to install the necessary kernel
  headers into /usr/include/linux/ prior to building cluster.  This
  usually means doing something like this:
cd /usr/src/linux
make headers_install
cp usr/include/linux/dlm* /usr/include/linux/
cp usr/include/linux/gfs* /usr/include/linux/
cp usr/include/linux/lock_dlm_plock.h /usr/include/linux/
cp usr/include/linux/lm_interface.h /usr/include/linux/
(that's all I can think of at the moment)
   
  Dave
   
 
 I did look into this a bit more and we need to make some kind of a
 decision here.
 
 If we expect people building the cluster to install the headers with
 headers_install then we can basically remove all the references to
 KERNEL_SRC in all userland and everything will keep building just fine.
 
 If we want to allow people to build the cluster with an outside kernel
 then we need to fix or change the dirafter and change the Makefile's
 that use KERNEL_SRC to be all consistent in the same way.
 
 Given that we need KERNEL_SRC defined for the kernel modules that we
 carry around in the CVS tree, I would suggest to go for the latter
 solution and allow our users to build whatever they need.

Yep, I agree

Re: [Cluster-devel] [PATCH] dlm/test Makefile cleanup

2007-08-24 Thread David Teigland

On Fri, Aug 24, 2007 at 02:06:29PM +0200, Fabio Massimo Di Nitto wrote:
 
 Hi David,
 
 any objections to this cleanup?
 
 It's very simple and just put inline the makefile with all the others.
 
 note that we don't build tests by default or ship them. This is not changed.

Looks good, thanks.

Re: [Cluster-devel] [GFS2] Remove ail2 list from the ai

2007-08-27 Thread David Teigland

On Mon, Aug 27, 2007 at 05:00:06PM +0100, Steven Whitehouse wrote:
 From 2a666f519dd12e8b3a82d1e16cad3114cfdd917d Mon Sep 17 00:00:00 2001
 From: Steven Whitehouse [EMAIL PROTECTED]
 Date: Mon, 27 Aug 2007 16:42:29 +0100
 Subject: [PATCH] [GFS2] Remove ail2 list from the ai
 
 The ail2 list wasn't actually used for anything other than gathering
 buffers at the end of in-place writeback, so remove it.

cluster/doc/journaling.txt has a nice description of ail2 and what it
does:

http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/doc/journaling.txt?rev=1.1content-type=text/x-cvsweb-markupcvsroot=cluster

Why is it no longer used?

[Cluster-devel] configure sbindir default

2007-09-06 Thread David Teigland

It looks to me like configure should be setting the default sbindir to
{prefix}/usr/sbin instead of {prefix}/sbin.

As it is now, using openais DESTDIR=/ and the default cluster/configure
settings, aisexec is installed to /usr/sbin/aisexec but cman_tool looks
for it in /sbin/aisexec and fails.

Dave

[Cluster-devel] -O2 -Werror

2007-09-06 Thread David Teigland

configure was recently changed from a default of -O0 to -O2.  A couple of
places in the tree also use -Werror.  The new combination of -O2 and
-Werror breaks the build when using default configure setting.

[using gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13)]

We need to remove the -Werror's, go back to a default of -O0, or change
the code that breaks.  I doubt that we'll be able to go too long with both
-O2 and -Werror, though, so one of them will probably need to change.

To get things building until someone comes up with a better solution, I'm
taking the simplest route and going back to a default of -O0.

Dave

[Cluster-devel] Re: configure sbindir default

2007-09-06 Thread David Teigland

On Thu, Sep 06, 2007 at 08:36:25PM +0200, Fabio Massimo Di Nitto wrote:
 David Teigland wrote:
  It looks to me like configure should be setting the default sbindir to
  {prefix}/usr/sbin instead of {prefix}/sbin.
  
  As it is now, using openais DESTDIR=/ and the default cluster/configure
  settings, aisexec is installed to /usr/sbin/aisexec but cman_tool looks
  for it in /sbin/aisexec and fails.

 I just noticed the discussion on IRC. I will look into it in a more deep
 way.  There are a lot of default paths that do not really make sense and
 we should review.

Perhaps, but in this case the default *was* /usr/sbin, which worked,
before the new configure which changed it to /sbin.

Re: [Cluster-devel] [PATCH] [GFS2] bz 276631 : GFS2: chmod hung - TRY 2

2007-09-14 Thread David Teigland

On Thu, Sep 13, 2007 at 11:04:43PM -0500, Bob Peterson wrote:
 diff -pur a/fs/gfs2/locking/dlm/thread.c b/fs/gfs2/locking/dlm/thread.c
 --- a/fs/gfs2/locking/dlm/thread.c2007-09-13 17:33:58.0 -0500
 +++ b/fs/gfs2/locking/dlm/thread.c2007-09-13 22:47:14.0 -0500
 @@ -279,8 +279,10 @@ static int gdlm_thread(void *data)
   /* Only thread1 is allowed to do blocking callbacks since gfs
  may wait for a completion callback within a blocking cb. */
  
 + spin_lock(ls-async_lock);
   if (current == ls-thread1)
   blist = 1;
 + spin_unlock(ls-async_lock);
  
   while (!kthread_should_stop()) {
   set_current_state(TASK_INTERRUPTIBLE);
 @@ -338,10 +340,12 @@ int gdlm_init_threads(struct gdlm_ls *ls
   struct task_struct *p;
   int error;
  
 + spin_lock(ls-async_lock);
   p = kthread_run(gdlm_thread, ls, lock_dlm1);
   error = IS_ERR(p);
   if (error) {
   log_error(can't start lock_dlm1 thread %d, error);
 + spin_unlock(ls-async_lock);
   return error;
   }
   ls-thread1 = p;
 @@ -351,9 +355,11 @@ int gdlm_init_threads(struct gdlm_ls *ls
   if (error) {
   log_error(can't start lock_dlm2 thread %d, error);
   kthread_stop(ls-thread1);
 + spin_unlock(ls-async_lock);
   return error;
   }
   ls-thread2 = p;
 + spin_unlock(ls-async_lock);

This is strange.  First, it seems very likely to me that kthread_run could
sleep, and almost certain that kthread_stop will sleep.  Second, using a
spinlock to signal a completion from one thread to another like this may
be common with mutexes/completions, but not with spinlocks.  I'd suggest
one of two alternatives.  Either have the thread check it's own name for
lock_dlm1, or add a new gdlm_thread1 function, i.e.

int _gdlm_thread(struct gdlm_ls *ls, int blist)
{
/* current function goes here */
}

int gdlm_thread1(void *data)
{
return _gdlm_thread(data, 1);
}

int gdlm_thread2(void *data)
{
return _gdlm_thread(data, 0);
}

kthread_run(gdlm_thread1, ls, lock_dlm1);
kthread_run(gdlm_thread2, ls, lock_dlm2);

[Cluster-devel] [PATCH] dlm: block dlm_recv in recovery transition

2007-09-27 Thread David Teigland

Introduce a per-lockspace rwsem that's held in read mode by dlm_recv
threads while working in the dlm.  This allows dlm_recv activity to be
suspended when the lockspace transitions to, from and between recovery
cycles.

The specific bug prompting this change is one where an in-progress
recovery cycle is aborted by a new recovery cycle.  While dlm_recv was
processing a recovery message, the recovery cycle was aborted and
dlm_recoverd began cleaning up.  dlm_recv decremented recover_locks_count
on an rsb after dlm_recoverd had reset it to zero.  This is fixed by
suspending dlm_recv (taking write lock on the rwsem) before aborting the
current recovery.

The transitions to/from normal and recovery modes are simplified by using
this new ability to block dlm_recv.  The switch from normal to recovery
mode means dlm_recv goes from processing locking messages, to saving them
for later, and vice versa.  Races are avoided by blocking dlm_recv when
setting the flag that switches between modes.

Signed-off-by: David Teigland [EMAIL PROTECTED]
---

diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h
index 74901e9..d2fc238 100644
--- a/fs/dlm/dlm_internal.h
+++ b/fs/dlm/dlm_internal.h
@@ -491,6 +491,7 @@ struct dlm_ls {
uint64_tls_recover_seq;
struct dlm_recover  *ls_recover_args;
struct rw_semaphore ls_in_recovery; /* block local requests */
+   struct rw_semaphore ls_recv_active; /* block dlm_recv */
struct list_headls_requestqueue;/* queue remote requests */
struct mutexls_requestqueue_mutex;
char*ls_recover_buf;
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 2082daf..8aef639 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -3638,55 +3638,8 @@ static void receive_lookup_reply(struct dlm_ls *ls, 
struct dlm_message *ms)
dlm_put_lkb(lkb);
 }
 
-int dlm_receive_message(struct dlm_header *hd, int nodeid, int recovery)
+static void _receive_message(struct dlm_ls *ls, struct dlm_message *ms)
 {
-   struct dlm_message *ms = (struct dlm_message *) hd;
-   struct dlm_ls *ls;
-   int error = 0;
-
-   if (!recovery)
-   dlm_message_in(ms);
-
-   ls = dlm_find_lockspace_global(hd-h_lockspace);
-   if (!ls) {
-   log_print(drop message %d from %d for unknown lockspace %d,
- ms-m_type, nodeid, hd-h_lockspace);
-   return -EINVAL;
-   }
-
-   /* recovery may have just ended leaving a bunch of backed-up requests
-  in the requestqueue; wait while dlm_recoverd clears them */
-
-   if (!recovery)
-   dlm_wait_requestqueue(ls);
-
-   /* recovery may have just started while there were a bunch of
-  in-flight requests -- save them in requestqueue to be processed
-  after recovery.  we can't let dlm_recvd block on the recovery
-  lock.  if dlm_recoverd is calling this function to clear the
-  requestqueue, it needs to be interrupted (-EINTR) if another
-  recovery operation is starting. */
-
-   while (1) {
-   if (dlm_locking_stopped(ls)) {
-   if (recovery) {
-   error = -EINTR;
-   goto out;
-   }
-   error = dlm_add_requestqueue(ls, nodeid, hd);
-   if (error == -EAGAIN)
-   continue;
-   else {
-   error = -EINTR;
-   goto out;
-   }
-   }
-
-   if (dlm_lock_recovery_try(ls))
-   break;
-   schedule();
-   }
-
switch (ms-m_type) {
 
/* messages sent to a master node */
@@ -3761,17 +3714,90 @@ int dlm_receive_message(struct dlm_header *hd, int 
nodeid, int recovery)
log_error(ls, unknown message type %d, ms-m_type);
}
 
-   dlm_unlock_recovery(ls);
- out:
-   dlm_put_lockspace(ls);
dlm_astd_wake();
-   return error;
 }
 
+/* If the lockspace is in recovery mode (locking stopped), then normal
+   messages are saved on the requestqueue for processing after recovery is
+   done.  When not in recovery mode, we wait for dlm_recoverd to drain saved
+   messages off the requestqueue before we process new ones. This occurs right
+   after recovery completes when we transition from saving all messages on
+   requestqueue, to processing all the saved messages, to processing new
+   messages as they arrive. */
 
-/*
- * Recovery related
- */
+static void dlm_receive_message(struct dlm_ls *ls, struct dlm_message *ms,
+   int nodeid)
+{
+   if (dlm_locking_stopped(ls)) {
+   dlm_add_requestqueue(ls, nodeid, (struct dlm_header *) ms);
+   } else {
+   dlm_wait_requestqueue(ls

Re: [Cluster-devel] time for STABLE2 branch

2007-10-09 Thread David Teigland

On Tue, Sep 18, 2007 at 04:40:45PM -0500, David Teigland wrote:
 I believe the time has arrived for a STABLE2 cluster branch.
 
 CVS HEAD hasn't been usable by most people for quite a while, partly due
 to ugly build requirements and partly due to instability of new code.
 Building HEAD currently requires the unstable version of openais, and a
 gfs2 patch that only exists in the gfs2 git tree -- both things that most
 people don't want to use, or are difficult to use.
 
 The RHEL5 branch has also been largely unusable for a long time, because
 it requires the RHEL5.1 kernel to build it, which isn't generally
 available AFAIK, much less widely in use.
 
 The STABLE2 branch will:
 - work with the latest stable upstream kernel from kernel.org
 - work with the whitetank branch of openais
 - otherwise have the same code/patches as the RHEL5 branch
 
 I think we should probably branch STABLE2 off of HEAD and then back out
 the changes that don't agree with the points above.  The reason we've been
 putting this off is because of the pain of checking changes into another
 branch.  I'm hoping to come up with some way to manage and track this with
 less manual intervetion, maybe even automate it to some extent.

I'm hoping that we can switch to git on sourcware.org prior to doing the
STABLE2 branch.

Given a STABLE2 git branch, another new branch for Fedora would be a short
and easy step away (there's a chance that STABLE2 itself may work for
Fedora.)

Dave

Re: [Cluster-devel] [PATCH][GFS2] Given device ID rather than s_id in id sysfs file

2007-11-02 Thread David Teigland

On Fri, Nov 02, 2007 at 09:37:15AM -0500, Bob Peterson wrote:
 Hi,
 
 This patch changes the /sys/fs/gfs2/s_id/id file to give the device
 id major:minor rather than the s_id.  That enables gfs2_tool to
 match devices properly (by id, not name) when locating the tuning files.

We have to be extremely cautious when changing the kernel abi like this;
have you verified that it doesn't break any existing programs?

 
 Regards,
 
 Bob Peterson
 --
 Signed-off-by: Bob Peterson [EMAIL PROTECTED]
 --
  fs/gfs2/sys.c |3 ++-
  1 files changed, 2 insertions(+), 1 deletions(-)
 
 diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
 index 06e0b77..10807b7 100644
 --- a/fs/gfs2/sys.c
 +++ b/fs/gfs2/sys.c
 @@ -32,7 +32,8 @@ spinlock_t gfs2_sys_margs_lock;
  
  static ssize_t id_show(struct gfs2_sbd *sdp, char *buf)
  {
 - return snprintf(buf, PAGE_SIZE, %s\n, sdp-sd_vfs-s_id);
 + return snprintf(buf, PAGE_SIZE, %u:%u\n,
 + MAJOR(sdp-sd_vfs-s_dev), MINOR(sdp-sd_vfs-s_dev));
  }
  
  static ssize_t fsname_show(struct gfs2_sbd *sdp, char *buf)

[Cluster-devel] Re: [2.6 patch] fs/dlm/: proper prototypes

2007-11-05 Thread David Teigland

On Sat, Nov 03, 2007 at 01:04:30AM +0100, Adrian Bunk wrote:
 This patch adds a proper prototype for some functions in 
 fs/dlm/dlm_internal.h

Acked-by: David Teigland [EMAIL PROTECTED]


 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]
 
 ---
 
  fs/dlm/dlm_internal.h |   16 
  fs/dlm/lock.c |1 -
  fs/dlm/lockspace.c|8 
  fs/dlm/main.c |   10 --
  4 files changed, 16 insertions(+), 19 deletions(-)
 
 11349b53af8d04ff007660d9142e2382033f5d8f 
 diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h
 index d2fc238..ec61bba 100644
 --- a/fs/dlm/dlm_internal.h
 +++ b/fs/dlm/dlm_internal.h
 @@ -570,5 +570,21 @@ static inline int dlm_no_directory(struct dlm_ls *ls)
   return (ls-ls_exflags  DLM_LSFL_NODIR) ? 1 : 0;
  }
  
 +int dlm_netlink_init(void);
 +void dlm_netlink_exit(void);
 +void dlm_timeout_warn(struct dlm_lkb *lkb);
 +
 +#ifdef CONFIG_DLM_DEBUG
 +int dlm_register_debugfs(void);
 +void dlm_unregister_debugfs(void);
 +int dlm_create_debug_file(struct dlm_ls *ls);
 +void dlm_delete_debug_file(struct dlm_ls *ls);
 +#else
 +static inline int dlm_register_debugfs(void) { return 0; }
 +static inline void dlm_unregister_debugfs(void) { }
 +static inline int dlm_create_debug_file(struct dlm_ls *ls) { return 0; }
 +static inline void dlm_delete_debug_file(struct dlm_ls *ls) { }
 +#endif
 +
  #endif   /* __DLM_INTERNAL_DOT_H__ */
  
 diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
 index 3915b8e..7bc6ad9 100644
 --- a/fs/dlm/lock.c
 +++ b/fs/dlm/lock.c
 @@ -88,7 +88,6 @@ static void __receive_convert_reply(struct dlm_rsb *r, 
 struct dlm_lkb *lkb,
  static int receive_extralen(struct dlm_message *ms);
  static void do_purge(struct dlm_ls *ls, int nodeid, int pid);
  static void del_timeout(struct dlm_lkb *lkb);
 -void dlm_timeout_warn(struct dlm_lkb *lkb);
  
  /*
   * Lock compatibilty matrix - thanks Steve
 diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c
 index 6353a83..b99485b 100644
 --- a/fs/dlm/lockspace.c
 +++ b/fs/dlm/lockspace.c
 @@ -24,14 +24,6 @@
  #include recover.h
  #include requestqueue.h
  
 -#ifdef CONFIG_DLM_DEBUG
 -int dlm_create_debug_file(struct dlm_ls *ls);
 -void dlm_delete_debug_file(struct dlm_ls *ls);
 -#else
 -static inline int dlm_create_debug_file(struct dlm_ls *ls) { return 0; }
 -static inline void dlm_delete_debug_file(struct dlm_ls *ls) { }
 -#endif
 -
  static int   ls_count;
  static struct mutex  ls_lock;
  static struct list_head  lslist;
 diff --git a/fs/dlm/main.c b/fs/dlm/main.c
 index eca2907..58487fb 100644
 --- a/fs/dlm/main.c
 +++ b/fs/dlm/main.c
 @@ -18,16 +18,6 @@
  #include memory.h
  #include config.h
  
 -#ifdef CONFIG_DLM_DEBUG
 -int dlm_register_debugfs(void);
 -void dlm_unregister_debugfs(void);
 -#else
 -static inline int dlm_register_debugfs(void) { return 0; }
 -static inline void dlm_unregister_debugfs(void) { }
 -#endif
 -int dlm_netlink_init(void);
 -void dlm_netlink_exit(void);
 -
  static int __init init_dlm(void)
  {
   int error;

[Cluster-devel] [PATCH] gfs2: check kthread_should_stop when waiting

2007-11-07 Thread David Teigland

Use wait_event_interruptible() in the lock_dlm thread instead
of an open coded equivalent, and include a kthread_should_stop()
check in the wait test so we don't miss a kthread_stop().

Signed-off-by: David Teigland [EMAIL PROTECTED]
---
 fs/gfs2/locking/dlm/thread.c |9 ++---
 1 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/gfs2/locking/dlm/thread.c b/fs/gfs2/locking/dlm/thread.c
index bd938f0..521694f 100644
--- a/fs/gfs2/locking/dlm/thread.c
+++ b/fs/gfs2/locking/dlm/thread.c
@@ -273,18 +273,13 @@ static int gdlm_thread(void *data, int blist)
struct gdlm_ls *ls = (struct gdlm_ls *) data;
struct gdlm_lock *lp = NULL;
uint8_t complete, blocking, submit, drop;
-   DECLARE_WAITQUEUE(wait, current);
 
/* Only thread1 is allowed to do blocking callbacks since gfs
   may wait for a completion callback within a blocking cb. */
 
while (!kthread_should_stop()) {
-   set_current_state(TASK_INTERRUPTIBLE);
-   add_wait_queue(ls-thread_wait, wait);
-   if (no_work(ls, blist))
-   schedule();
-   remove_wait_queue(ls-thread_wait, wait);
-   set_current_state(TASK_RUNNING);
+   wait_event_interruptible(ls-thread_wait,
+   !no_work(ls, blist) || kthread_should_stop());
 
complete = blocking = submit = drop = 0;
 
-- 
1.5.2.1

[Cluster-devel] [PATCH] dlm: use dlm prefix on alloc and free functions

2007-11-07 Thread David Teigland

The dlm functions in memory.c should use the dlm_ prefix.  Also, use
kzalloc/kfree directly for dlm_direntry's, removing the wrapper functions.

Signed-off-by: David Teigland [EMAIL PROTECTED]
---
 fs/dlm/dir.c   |   10 +-
 fs/dlm/lock.c  |   26 +-
 fs/dlm/lockspace.c |8 
 fs/dlm/memory.c|   32 
 fs/dlm/memory.h|   16 +++-
 fs/dlm/recover.c   |4 ++--
 6 files changed, 39 insertions(+), 57 deletions(-)

diff --git a/fs/dlm/dir.c b/fs/dlm/dir.c
index 4675455..600bb1d 100644
--- a/fs/dlm/dir.c
+++ b/fs/dlm/dir.c
@@ -49,7 +49,7 @@ static struct dlm_direntry *get_free_de(struct dlm_ls *ls, 
int len)
spin_unlock(ls-ls_recover_list_lock);
 
if (!found)
-   de = allocate_direntry(ls, len);
+   de = kzalloc(sizeof(struct dlm_direntry) + len, GFP_KERNEL);
return de;
 }
 
@@ -62,7 +62,7 @@ void dlm_clear_free_entries(struct dlm_ls *ls)
de = list_entry(ls-ls_recover_list.next, struct dlm_direntry,
list);
list_del(de-list);
-   free_direntry(de);
+   kfree(de);
}
spin_unlock(ls-ls_recover_list_lock);
 }
@@ -171,7 +171,7 @@ void dlm_dir_remove_entry(struct dlm_ls *ls, int nodeid, 
char *name, int namelen
}
 
list_del(de-list);
-   free_direntry(de);
+   kfree(de);
  out:
write_unlock(ls-ls_dirtbl[bucket].lock);
 }
@@ -302,7 +302,7 @@ static int get_entry(struct dlm_ls *ls, int nodeid, char 
*name,
 
write_unlock(ls-ls_dirtbl[bucket].lock);
 
-   de = allocate_direntry(ls, namelen);
+   de = kzalloc(sizeof(struct dlm_direntry) + namelen, GFP_KERNEL);
if (!de)
return -ENOMEM;
 
@@ -313,7 +313,7 @@ static int get_entry(struct dlm_ls *ls, int nodeid, char 
*name,
write_lock(ls-ls_dirtbl[bucket].lock);
tmp = search_bucket(ls, name, namelen, bucket);
if (tmp) {
-   free_direntry(de);
+   kfree(de);
de = tmp;
} else {
list_add_tail(de-list, ls-ls_dirtbl[bucket].list);
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 3915b8e..68f918e 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -335,7 +335,7 @@ static struct dlm_rsb *create_rsb(struct dlm_ls *ls, char 
*name, int len)
 {
struct dlm_rsb *r;
 
-   r = allocate_rsb(ls, len);
+   r = dlm_allocate_rsb(ls, len);
if (!r)
return NULL;
 
@@ -478,7 +478,7 @@ static int find_rsb(struct dlm_ls *ls, char *name, int 
namelen,
error = _search_rsb(ls, name, namelen, bucket, 0, tmp);
if (!error) {
write_unlock(ls-ls_rsbtbl[bucket].lock);
-   free_rsb(r);
+   dlm_free_rsb(r);
r = tmp;
goto out;
}
@@ -519,7 +519,7 @@ static void toss_rsb(struct kref *kref)
list_move(r-res_hashchain, ls-ls_rsbtbl[r-res_bucket].toss);
r-res_toss_time = jiffies;
if (r-res_lvbptr) {
-   free_lvb(r-res_lvbptr);
+   dlm_free_lvb(r-res_lvbptr);
r-res_lvbptr = NULL;
}
 }
@@ -589,7 +589,7 @@ static int create_lkb(struct dlm_ls *ls, struct dlm_lkb 
**lkb_ret)
uint32_t lkid = 0;
uint16_t bucket;
 
-   lkb = allocate_lkb(ls);
+   lkb = dlm_allocate_lkb(ls);
if (!lkb)
return -ENOMEM;
 
@@ -683,8 +683,8 @@ static int __put_lkb(struct dlm_ls *ls, struct dlm_lkb *lkb)
 
/* for local/process lkbs, lvbptr points to caller's lksb */
if (lkb-lkb_lvbptr  is_master_copy(lkb))
-   free_lvb(lkb-lkb_lvbptr);
-   free_lkb(lkb);
+   dlm_free_lvb(lkb-lkb_lvbptr);
+   dlm_free_lkb(lkb);
return 1;
} else {
write_unlock(ls-ls_lkbtbl[bucket].lock);
@@ -988,7 +988,7 @@ static int shrink_bucket(struct dlm_ls *ls, int b)
 
if (is_master(r))
dir_remove(r);
-   free_rsb(r);
+   dlm_free_rsb(r);
count++;
} else {
write_unlock(ls-ls_rsbtbl[b].lock);
@@ -1171,7 +1171,7 @@ static void set_lvb_lock(struct dlm_rsb *r, struct 
dlm_lkb *lkb)
return;
 
if (!r-res_lvbptr)
-   r-res_lvbptr = allocate_lvb(r-res_ls);
+   r-res_lvbptr = dlm_allocate_lvb(r-res_ls);
 
if (!r-res_lvbptr)
return;
@@ -1203,7 +1203,7 @@ static void set_lvb_unlock(struct dlm_rsb *r, struct 
dlm_lkb *lkb)
return;
 
if (!r-res_lvbptr)
-   r-res_lvbptr = allocate_lvb(r-res_ls);
+   r-res_lvbptr = dlm_allocate_lvb(r-res_ls);
 
if (!r-res_lvbptr

[Cluster-devel] [PATCH] dlm: don't print common non-errors

2007-11-07 Thread David Teigland

Change log_error() to log_debug() for conditions that can occur in
large number in normal operation.

Signed-off-by: David Teigland [EMAIL PROTECTED]
---
 fs/dlm/lock.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 3915b8e..c21deba 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -4259,7 +4259,7 @@ int dlm_recover_master_copy(struct dlm_ls *ls, struct 
dlm_rcom *rc)
put_rsb(r);
  out:
if (error)
-   log_print(recover_master_copy %d %x, error, rl-rl_lkid);
+   log_debug(ls, recover_master_copy %d %x, error, rl-rl_lkid);
rl-rl_result = error;
return error;
 }
-- 
1.5.2.1

[Cluster-devel] [PATCH] gfs2: tidy up error message

2007-11-16 Thread David Teigland

Print error with log_error() to be consistent with others.

Signed-off-by: David Teigland [EMAIL PROTECTED]
---
 fs/gfs2/locking/dlm/mount.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/locking/dlm/mount.c b/fs/gfs2/locking/dlm/mount.c
index ab30102..f2efff4 100644
--- a/fs/gfs2/locking/dlm/mount.c
+++ b/fs/gfs2/locking/dlm/mount.c
@@ -68,8 +68,7 @@ static int make_args(struct gdlm_ls *ls, char *data_arg, int 
*nodir)
strncpy(data, data_arg, 255);
 
if (!strlen(data)) {
-   printk(KERN_ERR
-  DLM/GFS2/GFS ERROR: (u)mount helpers are not 
installed!\n);
+   log_error(no mount options, (u)mount helpers not installed);
return -EINVAL;
}
 
-- 
1.5.2.1

Re: [Cluster-devel] cluster/group/gfs_controld lock_dlm.h

2007-11-21 Thread David Teigland

On Wed, Nov 21, 2007 at 05:50:16PM -, [EMAIL PROTECTED] wrote:
 CVSROOT:  /cvs/cluster
 Module name:  cluster
 Branch:   RHEL5
 Changes by:   [EMAIL PROTECTED]   2007-11-21 17:50:16
 
 Modified files:
   group/gfs_controld: lock_dlm.h 
 
 Log message:
   ASSERT was doing fprintf(stderr) which goes somewhere we don't want
   when running as a daemon.

created bz 394721 for this

[Cluster-devel] [PATCH] gfs2: use pid for plock owner for nfs clients

2007-12-06 Thread David Teigland

The fl_owner is that of lockd when posix locks arrive from nfs
clients, so it can't be used to distinguish between lock holders.
Use fl_pid as owner instead; it's the pid of the process on the
nfs client.

Signed-off-by: David Teigland [EMAIL PROTECTED]
---
 fs/gfs2/locking/dlm/plock.c |   18 ++
 1 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/locking/dlm/plock.c b/fs/gfs2/locking/dlm/plock.c
index 1f7b038..2ebd374 100644
--- a/fs/gfs2/locking/dlm/plock.c
+++ b/fs/gfs2/locking/dlm/plock.c
@@ -89,15 +89,19 @@ int gdlm_plock(void *lockspace, struct lm_lockname *name,
op-info.number = name-ln_number;
op-info.start  = fl-fl_start;
op-info.end= fl-fl_end;
-   op-info.owner  = (__u64)(long) fl-fl_owner;
if (fl-fl_lmops  fl-fl_lmops-fl_grant) {
+   /* fl_owner is lockd which doesn't distinguish
+  processes on the nfs client */
+   op-info.owner  = (__u64) fl-fl_pid;
xop-callback   = fl-fl_lmops-fl_grant;
locks_init_lock(xop-flc);
locks_copy_lock(xop-flc, fl);
xop-fl = fl;
xop-file   = file;
-   } else
+   } else {
+   op-info.owner  = (__u64)(long) fl-fl_owner;
xop-callback   = NULL;
+   }
 
send_op(op);
 
@@ -203,7 +207,10 @@ int gdlm_punlock(void *lockspace, struct lm_lockname *name,
op-info.number = name-ln_number;
op-info.start  = fl-fl_start;
op-info.end= fl-fl_end;
-   op-info.owner  = (__u64)(long) fl-fl_owner;
+   if (fl-fl_lmops  fl-fl_lmops-fl_grant)
+   op-info.owner  = (__u64) fl-fl_pid;
+   else
+   op-info.owner  = (__u64)(long) fl-fl_owner;
 
send_op(op);
wait_event(recv_wq, (op-done != 0));
@@ -242,7 +249,10 @@ int gdlm_plock_get(void *lockspace, struct lm_lockname 
*name,
op-info.number = name-ln_number;
op-info.start  = fl-fl_start;
op-info.end= fl-fl_end;
-   op-info.owner  = (__u64)(long) fl-fl_owner;
+   if (fl-fl_lmops  fl-fl_lmops-fl_grant)
+   op-info.owner  = (__u64) fl-fl_pid;
+   else
+   op-info.owner  = (__u64)(long) fl-fl_owner;
 
send_op(op);
wait_event(recv_wq, (op-done != 0));
-- 
1.5.2.1

Re: [Cluster-devel] Re: [PATCH] DLM: Fix static buffer alignment

2008-01-15 Thread David Teigland

 Hi Steven,
 
 you can try to pull from here:
 
 git pull git://git.fugedabout.it/people/fabbione/gfs2-2.6-nmw.git
 
 [DLM] Fix endian issue when transmitting or receiving LOCK_REPLY
 [DLM] align static buffer
 
 gitweb: 
 http://git.fugedabout.it/?p=people/fabbione/gfs2-2.6-nmw.git;a=summary
 
 Let me know if there are any problems...

I'm collecting all dlm patches for 2.6.25 here:

  http://people.redhat.com/teigland/dlm-patches-testing/

It includes the dlm patches already in the gfs2nmw git tree, plus the new
patches you've sent the past couple days.  I'm working on getting them
into my own public git tree.  If that doesn't happen soon enough for
2.6.25 I'll send the lot to Steve.

Re: [Cluster-devel] [PATCH] gfs2 umount: support fake -r option

2008-01-21 Thread David Teigland

On Sat, Jan 19, 2008 at 06:58:17AM +0100, Fabio M. Di Nitto wrote:
 
 Hi guys,
 
 in certain situations where gfs2 init scripts are not used to umount gfs2 
 volume, we endup with umount.gfs2 being invoked with -r option and this 
 fails because we don't know what to do with this option.
 
 The patch simply ack the option and ignores it for now allowing a reboot 
 process to keep going.
 
 Please ACK or apply.

ack

[Cluster-devel] current dlm patches

2008-01-21 Thread David Teigland


This is the current set of dlm patches that I'm collecting at
http://people.redhat.com/teigland/dlm-patches-testing/
I'm preparing to send these upstream for 2.6.25 in the next week
or so, depending on review and testing.  They come mainly from
- the mixed architecture testing and fixing that Fabio has been doing
- fixing problems with recovery overlapping my dlm stress program

[Cluster-devel] [PATCH] dlm: close othercons

2008-01-21 Thread David Teigland

From: Patrick Caulfeld [EMAIL PROTECTED]

This patch addresses a problem introduced with the last round of
lowcomms patches where the 'othercon' connections do not get freed when
the DLM shuts down.

This results in the error message
slab error in kmem_cache_destroy(): cache `dlm_conn': Can't free all
objects

and the DLM cannot be restarted without a system reboot.

See bz#428119

Signed-off-by: Patrick Caulfield [EMAIL PROTECTED]
Signed-off-by: Fabio M. Di Nitto [EMAIL PROTECTED]
Signed-off-by: David Teigland [EMAIL PROTECTED]
---
 fs/dlm/lowcomms.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 0bea802..b1cb855 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1437,6 +1437,8 @@ void dlm_lowcomms_stop(void)
con = __nodeid2con(i, 0);
if (con) {
close_connection(con, true);
+   if (con-othercon)
+   kmem_cache_free(con_cache, con-othercon);
kmem_cache_free(con_cache, con);
}
}
-- 
1.5.3.3

[Cluster-devel] [PATCH] dlm: proper prototypes

2008-01-21 Thread David Teigland

From: Adrian Bunk [EMAIL PROTECTED]

This patch adds a proper prototype for some functions in
fs/dlm/dlm_internal.h

Signed-off-by: Adrian Bunk [EMAIL PROTECTED]
Signed-off-by: David Teigland [EMAIL PROTECTED]
---
 fs/dlm/dlm_internal.h |   16 
 fs/dlm/lock.c |1 -
 fs/dlm/lockspace.c|8 
 fs/dlm/main.c |   10 --
 4 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h
index d2fc238..ec61bba 100644
--- a/fs/dlm/dlm_internal.h
+++ b/fs/dlm/dlm_internal.h
@@ -570,5 +570,21 @@ static inline int dlm_no_directory(struct dlm_ls *ls)
return (ls-ls_exflags  DLM_LSFL_NODIR) ? 1 : 0;
 }
 
+int dlm_netlink_init(void);
+void dlm_netlink_exit(void);
+void dlm_timeout_warn(struct dlm_lkb *lkb);
+
+#ifdef CONFIG_DLM_DEBUG
+int dlm_register_debugfs(void);
+void dlm_unregister_debugfs(void);
+int dlm_create_debug_file(struct dlm_ls *ls);
+void dlm_delete_debug_file(struct dlm_ls *ls);
+#else
+static inline int dlm_register_debugfs(void) { return 0; }
+static inline void dlm_unregister_debugfs(void) { }
+static inline int dlm_create_debug_file(struct dlm_ls *ls) { return 0; }
+static inline void dlm_delete_debug_file(struct dlm_ls *ls) { }
+#endif
+
 #endif /* __DLM_INTERNAL_DOT_H__ */
 
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 3915b8e..7bc6ad9 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -88,7 +88,6 @@ static void __receive_convert_reply(struct dlm_rsb *r, struct 
dlm_lkb *lkb,
 static int receive_extralen(struct dlm_message *ms);
 static void do_purge(struct dlm_ls *ls, int nodeid, int pid);
 static void del_timeout(struct dlm_lkb *lkb);
-void dlm_timeout_warn(struct dlm_lkb *lkb);
 
 /*
  * Lock compatibilty matrix - thanks Steve
diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c
index 6353a83..b99485b 100644
--- a/fs/dlm/lockspace.c
+++ b/fs/dlm/lockspace.c
@@ -24,14 +24,6 @@
 #include recover.h
 #include requestqueue.h
 
-#ifdef CONFIG_DLM_DEBUG
-int dlm_create_debug_file(struct dlm_ls *ls);
-void dlm_delete_debug_file(struct dlm_ls *ls);
-#else
-static inline int dlm_create_debug_file(struct dlm_ls *ls) { return 0; }
-static inline void dlm_delete_debug_file(struct dlm_ls *ls) { }
-#endif
-
 static int ls_count;
 static struct mutexls_lock;
 static struct list_headlslist;
diff --git a/fs/dlm/main.c b/fs/dlm/main.c
index eca2907..58487fb 100644
--- a/fs/dlm/main.c
+++ b/fs/dlm/main.c
@@ -18,16 +18,6 @@
 #include memory.h
 #include config.h
 
-#ifdef CONFIG_DLM_DEBUG
-int dlm_register_debugfs(void);
-void dlm_unregister_debugfs(void);
-#else
-static inline int dlm_register_debugfs(void) { return 0; }
-static inline void dlm_unregister_debugfs(void) { }
-#endif
-int dlm_netlink_init(void);
-void dlm_netlink_exit(void);
-
 static int __init init_dlm(void)
 {
int error;
-- 
1.5.3.3

[Cluster-devel] [PATCH] dlm: don't print common non-errors

2008-01-21 Thread David Teigland

Change log_error() to log_debug() for conditions that can occur in
large number in normal operation.

Signed-off-by: David Teigland [EMAIL PROTECTED]
---
 fs/dlm/lock.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 7bc6ad9..63fe74d 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -4258,7 +4258,7 @@ int dlm_recover_master_copy(struct dlm_ls *ls, struct 
dlm_rcom *rc)
put_rsb(r);
  out:
if (error)
-   log_print(recover_master_copy %d %x, error, rl-rl_lkid);
+   log_debug(ls, recover_master_copy %d %x, error, rl-rl_lkid);
rl-rl_result = error;
return error;
 }
-- 
1.5.3.3

[Cluster-devel] [PATCH] dlm: use fixed errno values in messages

2008-01-21 Thread David Teigland

Some errno values differ across platforms. So if we return things like
-EINPROGRESS from one node it can get misinterpreted or rejected on
another one.

This patch fixes up the errno values passed on the wire so that they
match the x86 ones (so as not to break the protocol), and re-instates
the platform-specific ones at the other end.

Many thanks to Fabio for testing this patch.
Initial patch from Patrick.

Signed-off-by: Patrick Caulfield [EMAIL PROTECTED]
Signed-off-by: Fabio M. Di Nitto [EMAIL PROTECTED]
Signed-off-by: David Teigland [EMAIL PROTECTED]
---
 fs/dlm/util.c |   57 +++--
 1 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/fs/dlm/util.c b/fs/dlm/util.c
index 38dcfeb..11c6a45 100644
--- a/fs/dlm/util.c
+++ b/fs/dlm/util.c
@@ -14,6 +14,14 @@
 #include rcom.h
 #include util.h
 
+#define DLM_ERRNO_EDEADLK  35
+#define DLM_ERRNO_EBADR53
+#define DLM_ERRNO_EBADSLT  57
+#define DLM_ERRNO_EPROTO   71
+#define DLM_ERRNO_EOPNOTSUPP   95
+#define DLM_ERRNO_ETIMEDOUT   110
+#define DLM_ERRNO_EINPROGRESS 115
+
 static void header_out(struct dlm_header *hd)
 {
hd-h_version   = cpu_to_le32(hd-h_version);
@@ -30,6 +38,51 @@ static void header_in(struct dlm_header *hd)
hd-h_length= le16_to_cpu(hd-h_length);
 }
 
+/* higher errno values are inconsistent across architectures, so select
+   one set of values for on the wire */
+
+static int to_dlm_errno(int err)
+{
+   switch (err) {
+   case -EDEADLK:
+   return -DLM_ERRNO_EDEADLK;
+   case -EBADR:
+   return -DLM_ERRNO_EBADR;
+   case -EBADSLT:
+   return -DLM_ERRNO_EBADSLT;
+   case -EPROTO:
+   return -DLM_ERRNO_EPROTO;
+   case -EOPNOTSUPP:
+   return -DLM_ERRNO_EOPNOTSUPP;
+   case -ETIMEDOUT:
+   return -DLM_ERRNO_ETIMEDOUT;
+   case -EINPROGRESS:
+   return -DLM_ERRNO_EINPROGRESS;
+   }
+   return err;
+}
+
+static int from_dlm_errno(int err)
+{
+   switch (err) {
+   case -DLM_ERRNO_EDEADLK:
+   return -EDEADLK;
+   case -DLM_ERRNO_EBADR:
+   return -EBADR;
+   case -DLM_ERRNO_EBADSLT:
+   return -EBADSLT;
+   case -DLM_ERRNO_EPROTO:
+   return -EPROTO;
+   case -DLM_ERRNO_EOPNOTSUPP:
+   return -EOPNOTSUPP;
+   case -DLM_ERRNO_ETIMEDOUT:
+   return -ETIMEDOUT;
+   case -DLM_ERRNO_EINPROGRESS:
+   return -EINPROGRESS;
+   }
+   return err;
+}
+
 void dlm_message_out(struct dlm_message *ms)
 {
struct dlm_header *hd = (struct dlm_header *) ms;
@@ -53,7 +106,7 @@ void dlm_message_out(struct dlm_message *ms)
ms-m_rqmode= cpu_to_le32(ms-m_rqmode);
ms-m_bastmode  = cpu_to_le32(ms-m_bastmode);
ms-m_asts  = cpu_to_le32(ms-m_asts);
-   ms-m_result= cpu_to_le32(ms-m_result);
+   ms-m_result= cpu_to_le32(to_dlm_errno(ms-m_result));
 }
 
 void dlm_message_in(struct dlm_message *ms)
@@ -79,7 +132,7 @@ void dlm_message_in(struct dlm_message *ms)
ms-m_rqmode= le32_to_cpu(ms-m_rqmode);
ms-m_bastmode  = le32_to_cpu(ms-m_bastmode);
ms-m_asts  = le32_to_cpu(ms-m_asts);
-   ms-m_result= le32_to_cpu(ms-m_result);
+   ms-m_result= from_dlm_errno(le32_to_cpu(ms-m_result));
 }
 
 static void rcom_lock_out(struct rcom_lock *rl)
-- 
1.5.3.3

[Cluster-devel] [PATCH] dlm: swap bytes for rcom lock reply

2008-01-21 Thread David Teigland

From: Fabio M. Di Nitto [EMAIL PROTECTED]

DLM_RCOM_LOCK_REPLY messages need byte swapping.

Signed-off-by: Fabio M. Di Nitto [EMAIL PROTECTED]
Signed-off-by: David Teigland [EMAIL PROTECTED]
---
 fs/dlm/util.c |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/dlm/util.c b/fs/dlm/util.c
index 963889c..38dcfeb 100644
--- a/fs/dlm/util.c
+++ b/fs/dlm/util.c
@@ -137,7 +137,7 @@ void dlm_rcom_out(struct dlm_rcom *rc)
rc-rc_seq  = cpu_to_le64(rc-rc_seq);
rc-rc_seq_reply= cpu_to_le64(rc-rc_seq_reply);
 
-   if (type == DLM_RCOM_LOCK)
+   if ((type == DLM_RCOM_LOCK) || (type == DLM_RCOM_LOCK_REPLY))
rcom_lock_out((struct rcom_lock *) rc-rc_buf);
 
else if (type == DLM_RCOM_STATUS_REPLY)
@@ -147,6 +147,7 @@ void dlm_rcom_out(struct dlm_rcom *rc)
 void dlm_rcom_in(struct dlm_rcom *rc)
 {
struct dlm_header *hd = (struct dlm_header *) rc;
+   int type;
 
header_in(hd);
 
@@ -156,10 +157,12 @@ void dlm_rcom_in(struct dlm_rcom *rc)
rc-rc_seq  = le64_to_cpu(rc-rc_seq);
rc-rc_seq_reply= le64_to_cpu(rc-rc_seq_reply);
 
-   if (rc-rc_type == DLM_RCOM_LOCK)
+   type = rc-rc_type;
+
+   if ((type == DLM_RCOM_LOCK) || (type == DLM_RCOM_LOCK_REPLY))
rcom_lock_in((struct rcom_lock *) rc-rc_buf);
 
-   else if (rc-rc_type == DLM_RCOM_STATUS_REPLY)
+   else if (type == DLM_RCOM_STATUS_REPLY)
rcom_config_in((struct rcom_config *) rc-rc_buf);
 }
 
-- 
1.5.3.3

[Cluster-devel] [PATCH] dlm: bind connections from known local address when using TCP

2008-01-21 Thread David Teigland

From: Lon Hohberger [EMAIL PROTECTED]

A common problem occurs when multiple IP addresses within the same
subnet are assigned to the same NIC.  If we make a connection attempt to
another address on the same subnet as one of those addresses, the
connection attempt will not necessarily be routed from the address we
want.

In the case of the DLM, the other nodes will quickly drop the connection
attempt, causing problems.

This patch makes the DLM bind to the local address it acquired from the
cluster manager when using TCP prior to making a connection, obviating
the need for administrators to fix their systems or use clever routing
tricks.

Signed-off-by: Lon Hohberger [EMAIL PROTECTED]
Signed-off-by: Patrick Caulfield [EMAIL PROTECTED]
Signed-off-by: David Teigland [EMAIL PROTECTED]
---
 fs/dlm/lowcomms.c |   13 -
 1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index e9923ca..0bea802 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -864,7 +864,7 @@ static void sctp_init_assoc(struct connection *con)
 static void tcp_connect_to_sock(struct connection *con)
 {
int result = -EHOSTUNREACH;
-   struct sockaddr_storage saddr;
+   struct sockaddr_storage saddr, src_addr;
int addr_len;
struct socket *sock;
 
@@ -898,6 +898,17 @@ static void tcp_connect_to_sock(struct connection *con)
con-connect_action = tcp_connect_to_sock;
add_sock(sock, con);
 
+   /* Bind to our cluster-known address connecting to avoid
+  routing problems */
+   memcpy(src_addr, dlm_local_addr[0], sizeof(src_addr));
+   make_sockaddr(src_addr, 0, addr_len);
+   result = sock-ops-bind(sock, (struct sockaddr *) src_addr,
+addr_len);
+   if (result  0) {
+   printk(dlm: could not bind for connect: %d\n, result);
+   /* This *may* not indicate a critical error */
+   }
+
make_sockaddr(saddr, dlm_config.ci_tcp_port, addr_len);
 
log_print(connecting to %d, con-nodeid);
-- 
1.5.3.3

[Cluster-devel] [PATCH] dlm: clear ast_type when removing from astqueue

2008-01-21 Thread David Teigland

The lkb_ast_type field indicates whether the lkb is on the astqueue list.
When clearing locks for a process, lkb's were being removed from the astqueue
list without clearing the field.  If release_lockspace then happened
immediately afterward, it could try to remove the lkb from the list a second
time.

Appears when process calls libdlm dlm_release_lockspace() which first
closes the ls dev triggering clear_proc_locks, and then removes the ls
(a write to control dev) causing release_lockspace().

Signed-off-by: David Teigland [EMAIL PROTECTED]
---
 fs/dlm/lock.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index ddb4628..43ca2a3 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -4678,6 +4678,7 @@ void dlm_clear_proc_locks(struct dlm_ls *ls, struct 
dlm_user_proc *proc)
}
 
list_for_each_entry_safe(lkb, safe, proc-asts, lkb_astqueue) {
+   lkb-lkb_ast_type = 0;
list_del(lkb-lkb_astqueue);
dlm_put_lkb(lkb);
}
-- 
1.5.3.3

[Cluster-devel] [PATCH] dlm: recover locks waiting for overlap replies

2008-01-21 Thread David Teigland

When recovery looks at locks waiting for replies, it fails to consider
locks that have already received a reply for their first remote operation,
but not received a reply for secondary, overlapping unlock/cancel.  The
appropriate stub reply needs to be called for these waiters.

Appears when we start doing recovery in the presence of a many overlapping
unlock/cancel ops.

Signed-off-by: David Teigland [EMAIL PROTECTED]
---
 fs/dlm/lock.c |   37 -
 1 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 43ca2a3..a758f1b 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -3846,6 +3846,7 @@ static int waiter_needs_recovery(struct dlm_ls *ls, 
struct dlm_lkb *lkb)
 void dlm_recover_waiters_pre(struct dlm_ls *ls)
 {
struct dlm_lkb *lkb, *safe;
+   int wait_type, stub_unlock_result, stub_cancel_result;
 
mutex_lock(ls-ls_waiters_mutex);
 
@@ -3864,7 +3865,33 @@ void dlm_recover_waiters_pre(struct dlm_ls *ls)
if (!waiter_needs_recovery(ls, lkb))
continue;
 
-   switch (lkb-lkb_wait_type) {
+   wait_type = lkb-lkb_wait_type;
+   stub_unlock_result = -DLM_EUNLOCK;
+   stub_cancel_result = -DLM_ECANCEL;
+
+   /* Main reply may have been received leaving a zero wait_type,
+  but a reply for the overlapping op may not have been
+  received.  In that case we need to fake the appropriate
+  reply for the overlap op. */
+
+   if (!wait_type) {
+   if (is_overlap_cancel(lkb)) {
+   wait_type = DLM_MSG_CANCEL;
+   if (lkb-lkb_grmode == DLM_LOCK_IV)
+   stub_cancel_result = 0;
+   }
+   if (is_overlap_unlock(lkb)) {
+   wait_type = DLM_MSG_UNLOCK;
+   if (lkb-lkb_grmode == DLM_LOCK_IV)
+   stub_unlock_result = -ENOENT;
+   }
+
+   log_debug(ls, rwpre overlap %x %x %d %d %d,
+ lkb-lkb_id, lkb-lkb_flags, wait_type,
+ stub_cancel_result, stub_unlock_result);
+   }
+
+   switch (wait_type) {
 
case DLM_MSG_REQUEST:
lkb-lkb_flags |= DLM_IFL_RESEND;
@@ -3877,7 +3904,7 @@ void dlm_recover_waiters_pre(struct dlm_ls *ls)
case DLM_MSG_UNLOCK:
hold_lkb(lkb);
ls-ls_stub_ms.m_type = DLM_MSG_UNLOCK_REPLY;
-   ls-ls_stub_ms.m_result = -DLM_EUNLOCK;
+   ls-ls_stub_ms.m_result = stub_unlock_result;
ls-ls_stub_ms.m_flags = lkb-lkb_flags;
_receive_unlock_reply(lkb, ls-ls_stub_ms);
dlm_put_lkb(lkb);
@@ -3886,15 +3913,15 @@ void dlm_recover_waiters_pre(struct dlm_ls *ls)
case DLM_MSG_CANCEL:
hold_lkb(lkb);
ls-ls_stub_ms.m_type = DLM_MSG_CANCEL_REPLY;
-   ls-ls_stub_ms.m_result = -DLM_ECANCEL;
+   ls-ls_stub_ms.m_result = stub_cancel_result;
ls-ls_stub_ms.m_flags = lkb-lkb_flags;
_receive_cancel_reply(lkb, ls-ls_stub_ms);
dlm_put_lkb(lkb);
break;
 
default:
-   log_error(ls, invalid lkb wait_type %d,
- lkb-lkb_wait_type);
+   log_error(ls, invalid lkb wait_type %d %d,
+ lkb-lkb_wait_type, wait_type);
}
schedule();
}
-- 
1.5.3.3

[Cluster-devel] [PATCH] dlm: another call to confirm_master in receive_request_reply

2008-01-21 Thread David Teigland

When a failed request (EBADR or ENOTBLK) is unlocked/canceled instead of
retried, there may be other lkb's waiting on the rsb_lookup list for it
to complete.  A call to confirm_master() is needed to move on to the next
waiting lkb since the current one won't be retried.

Signed-off-by: David Teigland [EMAIL PROTECTED]
---
 fs/dlm/lock.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index a758f1b..d5e8ea1 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -1940,8 +1940,11 @@ static void confirm_master(struct dlm_rsb *r, int error)
break;
 
case -EAGAIN:
-   /* the remote master didn't queue our NOQUEUE request;
-  make a waiting lkb the first_lkid */
+   case -EBADR:
+   case -ENOTBLK:
+   /* the remote request failed and won't be retried (it was
+  a NOQUEUE, or has been canceled/unlocked); make a waiting
+  lkb the first_lkid */
 
r-res_first_lkid = 0;
 
@@ -3382,6 +3385,7 @@ static void receive_request_reply(struct dlm_ls *ls, 
struct dlm_message *ms)
if (is_overlap(lkb)) {
/* we'll ignore error in cancel/unlock reply */
queue_cast_overlap(r, lkb);
+   confirm_master(r, result);
unhold_lkb(lkb); /* undoes create_lkb() */
} else
_request_lock(r, lkb);
-- 
1.5.3.3

[Cluster-devel] [PATCH] dlm: limit dir lookup loop

2008-01-21 Thread David Teigland

In a rare case we may need to repeat a local resource directory lookup
due to a race with removing the rsb and removing the resdir record.
We'll never need to do more than a single additional lookup, though,
so the infinite loop around the lookup can be removed.  In addition
to being unnecessary, the infinite loop is dangerous since some other
unknown condition may appear causing the loop to never break.

Signed-off-by: David Teigland [EMAIL PROTECTED]
---
 fs/dlm/lock.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index fa68e9b..bc2e4ba 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -1851,7 +1851,7 @@ static void send_blocking_asts_all(struct dlm_rsb *r, 
struct dlm_lkb *lkb)
 static int set_master(struct dlm_rsb *r, struct dlm_lkb *lkb)
 {
struct dlm_ls *ls = r-res_ls;
-   int error, dir_nodeid, ret_nodeid, our_nodeid = dlm_our_nodeid();
+   int i, error, dir_nodeid, ret_nodeid, our_nodeid = dlm_our_nodeid();
 
if (rsb_flag(r, RSB_MASTER_UNCERTAIN)) {
rsb_clear_flag(r, RSB_MASTER_UNCERTAIN);
@@ -1885,7 +1885,7 @@ static int set_master(struct dlm_rsb *r, struct dlm_lkb 
*lkb)
return 1;
}
 
-   for (;;) {
+   for (i = 0; i  2; i++) {
/* It's possible for dlm_scand to remove an old rsb for
   this same resource from the toss list, us to create
   a new one, look up the master locally, and find it
@@ -1899,6 +1899,8 @@ static int set_master(struct dlm_rsb *r, struct dlm_lkb 
*lkb)
log_debug(ls, dir_lookup error %d %s, error, r-res_name);
schedule();
}
+   if (error  error != -EEXIST)
+   return error;
 
if (ret_nodeid == our_nodeid) {
r-res_first_lkid = 0;
-- 
1.5.3.3

[Cluster-devel] [PATCH] dlm: change error message to debug

2008-01-21 Thread David Teigland

The invalid lockspace messages are normal and can appear relatively
often.  They should be suppressed without debugging enabled.

Signed-off-by: David Teigland [EMAIL PROTECTED]
---
 fs/dlm/lock.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index bc2e4ba..7ee7c7c 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -3857,8 +3857,9 @@ void dlm_receive_buffer(struct dlm_header *hd, int nodeid)
 
ls = dlm_find_lockspace_global(hd-h_lockspace);
if (!ls) {
-   log_print(invalid h_lockspace %x from %d cmd %d type %d,
- hd-h_lockspace, nodeid, hd-h_cmd, type);
+   if (dlm_config.ci_log_debug)
+   log_print(invalid lockspace %x from %d cmd %d type %d,
+ hd-h_lockspace, nodeid, hd-h_cmd, type);
 
if (hd-h_cmd == DLM_RCOM  type == DLM_RCOM_STATUS)
dlm_send_ls_not_ready(nodeid, rc);
-- 
1.5.3.3

[Cluster-devel] second batch of dlm patches for 2.6.25

2008-02-06 Thread David Teigland

I've sent a second batch of dlm patches to lkml for review prior to
another pull request for 2.6.25, beginning here:

http://lkml.org/lkml/2008/2/7/10

They are also in the test branch of dlm.git:

http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/teigland/dlm.git;a=shortlog;h=test

[Cluster-devel] Re: [2.6 patch] make dlm_print_rsb() static

2008-02-19 Thread David Teigland

On Wed, Feb 13, 2008 at 11:29:38PM +0200, Adrian Bunk wrote:
 dlm_print_rsb() can now become static.
 
 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]

Thanks, added to dlm.git.
Dave

Re: [Cluster-devel] STABLE2 cluster branch

2008-03-03 Thread David Teigland

On Sat, Mar 01, 2008 at 02:52:05PM -0700, Steven Dake wrote:
 This is reasonable but requires having quite a bit of conditional
 compilation in cman and other tools.  I don't know if anyone is working
 on this, but I'd imagine maintenance of such a scheme would be
 complicated since the trunk of whitetank is about to rev into tigh speed
 modification requiring different dependencies of the gfs userland.

 If we are to say this conditional compilation only works with trunk of
 openais up to a certain point such as version 0.84 then that certain
 point becomes a branch point which I really do not want.  What I
 prefer is that trunk of gfs userland be munged to work with the new
 corosync dependency and once that has all stabilized create a new branch
 of userland to work with the corosync 1.0 infrastructure.  The complete
 software suite then would be stable3 + corosync 1.X + trunk of
 openais ais services for the checkpoint service.

So it sounds like the next stable release of openais will be in the new
form of corosync + openais?  Will Fedora 9 have whitetank or the new
corosync+openais release?

We definately need to do a release or two of cluster-2.y.z from STABLE2
based on openais whitetank.  Then, once a stable release of
corosync+openais exists, I see sense in either:

1. switching STABLE2 from whitetank to the corosync+openais release
2. supporting both whitetank and corosync in STABLE2 somehow, perhaps
   dropping whitetank support after a while

1 would make most sense if F9 has corosync, 2 would make most sense if F9
has whitetank.

Dave

Re: [Cluster-devel] STABLE2 cluster branch

2008-03-03 Thread David Teigland

On Mon, Mar 03, 2008 at 05:10:54PM +0100, Fabio M. Di Nitto wrote:
 If we are to say this conditional compilation only works with trunk of
 openais up to a certain point such as version 0.84 then that certain
 point becomes a branch point which I really do not want.  What I
 prefer is that trunk of gfs userland be munged to work with the new
 corosync dependency and once that has all stabilized create a new branch
 of userland to work with the corosync 1.0 infrastructure.  The complete
 software suite then would be stable3 + corosync 1.X + trunk of
 openais ais services for the checkpoint service.
 
 So it sounds like the next stable release of openais will be in the new
 form of corosync + openais?  Will Fedora 9 have whitetank or the new
 corosync+openais release?
 
 We definately need to do a release or two of cluster-2.y.z from STABLE2
 based on openais whitetank.  Then, once a stable release of
 corosync+openais exists, I see sense in either:
 
 1. switching STABLE2 from whitetank to the corosync+openais release
 2. supporting both whitetank and corosync in STABLE2 somehow, perhaps
   dropping whitetank support after a while
 
 1 would make most sense if F9 has corosync, 2 would make most sense if F9
 has whitetank.
 
 Clearly STABLE2 is running on truck and what would be corosync+openais 
 hopefully in not too long from now.
 
 Does it make sense to roll back to whitetank and back in such short time? 
 Let's keep in mind that if we push out stable releases into distro with 
 the stable2+whitetank combo, i assume we will need to keep supporting it 
 for a while before turning stable2 to support corosync.
 
 Hence my general idea of just #ifdeffing openais support in stable2 to 
 handle both whitetank and corosync at build time (no runtime detection) 
 and let the users/distros decide what combo they prefer.
 
 If you look at it:
 
 whitetank does not change. stable2 support will only need roll back.
 
 trunk changes in openais. our master follows openais trunk. Commit the 
 diff into stable2. It's going to be just a bit painful in the very 
 beginning but at the end it's a matter of a cherry pick or almost.

Yeah, good point.

Re: [Cluster-devel] STABLE2 cluster branch

2008-03-03 Thread David Teigland

On Mon, Mar 03, 2008 at 10:07:26AM -0700, Steven Dake wrote:
 On Mon, 2008-03-03 at 09:10 -0600, David Teigland wrote:
  On Sat, Mar 01, 2008 at 02:52:05PM -0700, Steven Dake wrote:
   This is reasonable but requires having quite a bit of conditional
   compilation in cman and other tools.  I don't know if anyone is working
   on this, but I'd imagine maintenance of such a scheme would be
   complicated since the trunk of whitetank is about to rev into tigh speed
   modification requiring different dependencies of the gfs userland.
  
   If we are to say this conditional compilation only works with trunk of
   openais up to a certain point such as version 0.84 then that certain
   point becomes a branch point which I really do not want.  What I
   prefer is that trunk of gfs userland be munged to work with the new
   corosync dependency and once that has all stabilized create a new branch
   of userland to work with the corosync 1.0 infrastructure.  The complete
   software suite then would be stable3 + corosync 1.X + trunk of
   openais ais services for the checkpoint service.
  
  So it sounds like the next stable release of openais will be in the new
  form of corosync + openais?  Will Fedora 9 have whitetank or the new
  corosync+openais release?
  
  We definately need to do a release or two of cluster-2.y.z from STABLE2
  based on openais whitetank.  Then, once a stable release of
  corosync+openais exists, I see sense in either:
  
  1. switching STABLE2 from whitetank to the corosync+openais release
  2. supporting both whitetank and corosync in STABLE2 somehow, perhaps
 dropping whitetank support after a while
  
  1 would make most sense if F9 has corosync, 2 would make most sense if F9
  has whitetank.
  
 
 I agree we need to release stable2 with the current whitetank.
 
 While I would like to have corosync enabled for F9, it wont be ready in
 time for that distribution.  The corosync tree hasn't yet emerged so
 targeting f9 is a bit premature.
 
 Unfortunately this creates quite a bit more work WRT ifdeffing of the
 code to support either corosync or whitetank.  I don't mind helping with
 the rest of the infrastructure conversion to corosync in the trunk of
 the gfs tree, but keeping stable2 operational with both sounds like a
 lot of difficult work.
 
 If the distributions really need it, however, it is something we should
 address.  I believe really what we need is a stable 3 which is branched
 from trunk to work with corosync once corosync and trunk have met some
 level of capabilities (like it compiles, works, and passes heavy stress
 testing).  But maybe this creating of stable3 is more work then making
 stable2 work with both openais and corosync.

We already have two parallel branches of the cluster2 (second generation)
code: RHEL5 and STABLE2; we really don't want a third.

Given that, we have four options for STABLE2:
1. work with whitetank
2. work with corosync
3. use ifdefs to work with both at once
4. work with whitetank for now, switch to corosync once it's stable


 In my opinion though, a stable branch shouldn't add features or entirely
 new foundations for the code such as a new infrastructure.  So I'm not
 sure why to call it stable2 if in fact it is a stable trunk :)

As I mentioned above, I was assuming we'd wait to convert STABLE2 to
corosync until there was actually a stable, released version of it.

[Cluster-devel] avoid polluting the git history

2008-03-05 Thread David Teigland

I've gone and looked at the cluster.git history and realize that we're not
using git quite as we should.  We're polluting the cluster.git history
with unnecessary merge commits.  It's not that git is being used wrongly
per se, just not nicely.  Not to blame anyone, but pull up git web in
your browser and notice these kinds of entries in the history:

Lon Hohberger Merge branch 'master' of ssh://[EMAIL PROTECTED] ...

Ryan McCabe Merge branch 'master' of ssh://sources.redhat.com/git ...

Christine Caulfield Merge branch 'cman3'

Chris Feist Merge branch 'RHEL4' of ssh://sources.redhat.com/git ...


The short and easy answer for how to avoid this is that between these two
steps:
1. creating a new branch that you will add commits to and eventually push from
2. actually pushing from that branch

you should not do any merges within that branch.  That includes not doing
any git-pull's in that branch (pull does a merge).

If you google for git avoiding merge commits you'll find a number of git
howto's for various projects that explain this problem (and how to avoid
it) differently.

There are two main ways to use git:
- merging mode
- rebasing mode

Git was designed especially for the merging style where there's a lot of
pulling and a large heirarchy of maintainers, like the linux kernel.  A
large portion of the documentation focuses on how to use git in the
merging/pulling mode.  This is not for us; we have a completely flat
hierarchy.

In the rebasing style, git is largely just a fancy patch manager like
quilt.  This is the appropriate method for us.  It means that you only
really use git pull to update the branches that you don't use directly.
With the branches you *do* use (branched from the former ones that you
update by pulling), you only typically use git-commit, git-rebase, and
git-cherry-pick, which all simply apply commits one on top of another as
patches; they don't do merges.

git-branch
  master
* rhel5

(rhel5 was created via git checkout -b rhel5 origin/RHEL5, I don't ever
commit anything to rhel5)

git pull

(updates rhel5 branch with all the latest changes in the repository)

git checkout -b fix-bz1234 rhel5

(create a branch to do some work)

git commit

git push ssh://sources.redhat.com/git/cluster.git fix-bz1234:RHEL5

(if this fails, then someone has checked something into the RHEL5 branch
since I updated by copy via the pull)

To resolve this,

git checkout rhel5

git pull

(updates rhel5 branch with the new commit)

Now there are two ways to recreate your branch with fixes without doing a
merge.

Option 1, using git-rebase
--

(we were in the now updated rhel5 branch)

git checkout fix-bz1234

git format-patch -1 --stdout  /tmp/fix-bz1234.patch

(save work in case something goes wrong)

git rebase rhel5

(this places your commit on top of the latest rhel5 code without doing a
merge and a merge commit, which we are trying to avoid.  Do not use
git-pull to combine the latest rhel5 code with your fix-bz1234 branch, as
that will create an unnecessary merge commit.)

git push ssh://sources.redhat.com/git/cluster.git fix-bz1234:RHEL5

(should now work)

Option 2, using git-cherry-pick
---

(we were in the now updated rhel5 branch)

git checkout -b fix-bz1234-try2

git log fix-bz1234
and copy the SHA1 of the commit you made

git-cherry-pick SHA1 of your commit

git push ssh://sources.redhat.com/git/cluster.git fix-bz1234-try2:RHEL5

(should now work)



There is another good use for git-pull, and that is to combine multiple
local branches together into a local collection that you want to test.
Say you have local branches fix-1234 and fix-5678, both based off the
rhel5 branch, and you want to combine the fixes to test them together.
You'd do something like this:

git checkout rhel5

git checkout -b ALL rhel5

git pull . fix-1234

git pull . fix-5678

Now you can test the ALL branch, but you shouldn't commit from it.

Re: [Cluster-devel] cluster-2.02.00

2008-03-11 Thread David Teigland

On Tue, Mar 11, 2008 at 03:24:24PM +0100, Kadlecsik Jozsef wrote:
 On Thu, 6 Mar 2008, David Teigland wrote:
 
  A new source tarball of cluster code has been released: cluster-2.02.00
 
 Is there a changelog available somewhere? I could not find it in the 
 tarball.

No, sorry, I wish there was.  Maybe we can go back and figure out which
commit cluster-2.01 was created at and tag it; then git could give us
something, at least (although our cvs commit messages were often lacking.)

Dave

Re: [Cluster-devel] libdlm dlm_ls_lock_wait() doesn't.

2008-03-20 Thread David Teigland

On Wed, Mar 19, 2008 at 03:35:11PM -0700, Joel Becker wrote:
 Folks,
   Another problem I've run into with libdlm - call
 dlm_ls_lock_wait() on a lock that another node holds, and it returns
 instead of blocking.  This is not a trylock (LKF_NOQUEUE).  Trylocks
 work as expected.  A blocking lock attempt does not block, it just
 fails.  I haven't had the time to nail it yet, so if you get there
 first, excellent.

I've tested both threaded and non-threaded dlm_ls_lock_wait() and they
seem to work for me.  A mistake that I got hung up on for a while was that
a non-threaded program must link against libdlm_lt, not libdlm.

So, a threaded program needs: -D_REENTRANT -lpthread -ldlm
and a non-threaded program needs: -ldlm_lt

Also, in a threaded program, you need to call dlm_ls_pthread_init(handle);
right after creating the lockspace.  I'm not sure what the symptoms would
be if you left out the pthread_init().  The symptoms when I mistakenly
linked my non-threaded program with libdlm were that dlm_ls_lock_wait()
didn't return at all.

[Cluster-devel] cluster-2.03.00

2008-04-11 Thread David Teigland

A new source tarball of cluster code has been released: cluster-2.03.00
This has been taken from the STABLE2 branch in the cluster git tree.  It
is compatible with the current stable release of openais (0.80.3), and the
current stable release of the kernel (2.6.24).

  ftp://sources.redhat.com/pub/cluster/releases/cluster-2.03.00.tar.gz

To use gfs, a kernel patch is required to export three symbols from gfs2:
  ftp://sources.redhat.com/pub/cluster/releases/lockproto-exports.patch


Abhijith Das (3):
  gfs2_tool: remove 'gfs2_tool counters' as they aren't implemented anymore
  gfs-kernel: fix for bz 429343 gfs_glock_is_locked_by_me assertion
  gfs2_tool manpage: gfs2_tool counters doesn't exist anymore.

Andrew Price (1):
  [[BUILD] Warn and continue if CONFIG_KERNELVERSION is not found

Bob Peterson (9):
  Resolves: bz 435917: GFS2: mkfs.gfs2 default lock protocol
  Resolves: bz 421761: 'gfs_tool lockdump' wrongly says 'unknown
  Resolves: bz 431945: GFS: gfs-kernel should use device major:minor
  Update to prior commit for bz431945: I forgot that STABLE2
  Resolves: bz 436383: GFS filesystem size inconsistent
  Fix savemeta so it saves gfs-1 rg information properly
  Fix gfs2_edit print options (-p) to work properly for gfs-1
  gfs2_edit was not recalculating the max block size after it figured
  Fix some compiler warnings in gfs2_edit

Chris Feist (1):
  Added back in change to description line to make chkconfig work properly.

Christine Caulfield (5):
  [DLM] Don't segfault if lvbptr is NULL
  [CMAN] Free up any queued messages when someone disconnects
  [CMAN] Limit outstanding replies
  [CMAN] valid port number  don't use it before validation
  Remove references to broadcast.

David Teigland (4):
  doc: update usage.txt
  groupd: purge messages from dead nodes
  dlm_tool: print correct rq mode in lockdump
  libdlm: fix lvb copying

Fabio M. Di Nitto (8):
  [BUILD] Fix configure script to handle releases
  [BUILD] Fix build system with openais whitetank
  [BUILD] Allow release version to contain padding 0's
  Add toplevel .gitignore
  [BUILD] Fix handling of version and libraries soname
  [BUILD] Fix man page install permission
  Revert Fix help message to refer to script as 'fence_scsi_test'.
  Revert fix bz277781 by accepting nodename as a synonym for node

Joel Becker (1):
  libdlm: Don't pass LKF_WAIT to the kernel

Jonathan Brassow (4):
  rgmanager/lvm.sh: Fix bug 438816
  rgmanager/lvm.sh:  Fix bug bz242798
  rgmanager/lvm.sh: change argument order of shell command
  rgmanager/lvm.sh:  Minor comment updates

Lon Hohberger (10):
  Add Sybase failover agent
  Update changelog
  Add / fix Oracle 10g failover agent
  [rgmanager] Make ip.sh check link states of non-ethernet devices
  [rgmanager] Set cloexec bit in msg_socket.c
  [rgmanager] Don't call quotaoff if quotas are not used
  [CMAN] Fix Node X is undead loop bug
  [rgmanager] Fix #432998
  [cman] Apply missing fix for #315711
  [CMAN] Make cman init script start qdiskd intelligently

Ryan McCabe (1):
  fix bz277781 by accepting nodename as a synonym for node

Ryan O'Hara (15):
  Variable should be quoted in conditional statement.
  Fix unregister code to report failure correctly.
  Remove self parameter. This was used to specify the name of the node
  Fix code to use get_key subroutine.
  Fix split calls to be consistent. Remove the optional LIMIT parameter.
  Replace /var/lock/subsys/${0##*/} with /var/lock/subsys/scsi_reserve.
  Fix success/failure reporting when registering devices at startup.
  Rewrite of get_scsi_devices function.
  Record devices that are successfully registered to /var/run/scsi_reserve.
  Allow 'stop' to release the reservation if and only if there are no other
  Attempt to register the node in the case where it must perform fence_scsi
  Fix help message to refer to script as 'fence_scsi_test'.
  BZ 248715
  BZ: 373491, 373511, 373531, 373541, 373571, 429033
  BZ 441323 : Redirect stderr to /dev/null when getting list of devices.


 .gitignore|1 +
 cman/daemon/Makefile  |3 +-
 cman/daemon/cmanccs.c |   11 +-
 cman/daemon/cnxman-private.h  |2 +-
 cman/daemon/commands.c|2 +-
 cman/daemon/daemon.c  |   40 ++-
 cman/daemon/daemon.h  |3 +-
 cman/init.d/cman.in   |   32 ++
 cman/init.d/qdiskd|   21 +-
 cman/lib/Makefile |   14 +-
 cman/man/cman_tool.8  |   20 +-
 cman/qdisk/main.c |   34 +-
 configure |   87 +++-
 dlm/lib/Makefile  |   26 +-
 dlm/lib/libdlm.c  |   15 +-
 dlm/tool/main.c

Re: [Cluster-devel] cluster-2.03.00

2008-04-14 Thread David Teigland

On Sun, Apr 13, 2008 at 01:56:12PM +0200, Fabio M. Di Nitto wrote:
 On Fri, 11 Apr 2008, David Teigland wrote:
 
 A new source tarball of cluster code has been released: cluster-2.03.00
 This has been taken from the STABLE2 branch in the cluster git tree.  It
 is compatible with the current stable release of openais (0.80.3), and the
 current stable release of the kernel (2.6.24).
 
  ftp://sources.redhat.com/pub/cluster/releases/cluster-2.03.00.tar.gz
 
 
 Hi David,
 
 I think I either misunderstood the versioning system or I missed something 
 along the way.
 
 My understanding was that STABLE2 release would have had version 2.02.xx
 where xx is incremental per release and a stable SONAME set to 2.2.
 
 Snapshots from master would have taken 2.99.xx and kept an unstable API 
 set for commodity to 2.9.
 
 I really don't think we should change the SONAME unless there is a need 
 for it.

I never had a clear plan for when we'd increment the middle number vs when
we'd increment the last number; we've always incremented the middle number
in the handful of past releases.  I've also never understood how .so
naming should be done.  Should .so numbers even be associated with the
cluster release numbers?  What does the release number really mean; what
is it useful for?  Similar question for .so numbers, what do they mean,
when should they change, when shouldn't they change?  Since you have a far
better understanding in this area than I do (and you're the one who
actually implements it :-), could you define the rules for us for how this
all works?  I don't have any preferences in the matter; I'm open to
whatever you come up with.

Dave

[Cluster-devel] kernel for building master

2008-04-14 Thread David Teigland

On Fri, Apr 11, 2008 at 04:29:52PM -, [EMAIL PROTECTED] wrote:
 - Log -
 commit 77bce77b5034adf8f00090b13dde7c7d481b0dd9
 Author: David Teigland [EMAIL PROTECTED]
 Date:   Wed Mar 19 16:05:20 2008 -0500
 
 dlm_controld: new version
 
 - uses libcpg directly without libgroup (use the -g0 option)
 - takes over plock handling from gfs_controld
 - interacts with fenced and fs_controld to coordinate recovery (todo)
 - runs in backward compat mode by default, using libgroup to interact
   with old groupd/dlm_controld (-g1 option)
 - plan to add a new default -g2 option that will detect old groupd's in
   the cluster and only run in old mode if any exist
 
 Signed-off-by: David Teigland [EMAIL PROTECTED]
 
 ---

This commit assumes dlm kernel changes that are only available in
linux-next or linux-mm (linux/dlm_plock.h).  This goes against our aim to
keep master building against -rc kernels by default, so the following
patch disables the relevant part for now.  Fabio has said he may turn this
ifdef into something more sophisticated.


diff --git a/group/dlm_controld/main.c b/group/dlm_controld/main.c
index b954f53..25c0796 100644
--- a/group/dlm_controld/main.c
+++ b/group/dlm_controld/main.c
@@ -546,11 +546,13 @@ static int loop(void)
setup_deadlock();
}
 
+#ifdef BUILD_PLOCK
rv = setup_plocks();
if (rv  0)
goto out;
plock_fd = rv;
plock_ci = client_add(rv, process_plocks, NULL);
+#endif BUILD_PLOCK
}
 
for (;;) {
diff --git a/group/dlm_controld/plock.c b/group/dlm_controld/plock.c
index 4dc38ac..5492cc1 100644
--- a/group/dlm_controld/plock.c
+++ b/group/dlm_controld/plock.c
@@ -12,6 +12,8 @@
 
 #include dlm_daemon.h
 #include config.h
+
+#ifdef BUILD_PLOCK
 #include linux/dlm_plock.h
 
 #define PROC_MISC   /proc/misc
@@ -2293,3 +2295,21 @@ int dump_plocks(char *name, int fd)
return 0;
 }
 
+#else
+
+int setup_plocks(void) { return 0 };
+void process_plocks(int ci) { };
+int limit_plocks(void) { return 0; };
+void receive_plock(struct lockspace *ls, struct dlm_header *hd, int len) { };
+void receive_own(struct lockspace *ls, struct dlm_header *hd, int len) { };
+void receive_sync(struct lockspace *ls, struct dlm_header *hd, int len) { };
+void receive_drop(struct lockspace *ls, struct dlm_header *hd, int len) { };
+void process_saved_plocks(struct lockspace *ls) { };
+void close_plock_checkpoint(struct lockspace *ls) { };
+void store_plocks(struct lockspace *ls) { };
+void retrieve_plocks(struct lockspace *ls) { };
+void purge_plocks(struct lockspace *ls, int nodeid, int unmount) { };
+int dump_plocks(char *name, int fd) { return 0 };
+
+#endif
+

[Cluster-devel] [PATCH 1/6] dlm: match signedness between dlm_config_info and cluster_set

2008-04-15 Thread David Teigland

From: Harvey Harrison [EMAIL PROTECTED]

cluster_set is only called from the macro CLUSTER_ATTR which defines read/write
access functions.  Make the signedness match to avoid sparse warnings every time
CLUSTER_ATTR is used (lines 149-159) all of the form:

fs/dlm/config.c:149:1: warning: incorrect type in argument 3 (different 
signedness)
fs/dlm/config.c:149:1:expected unsigned int *info_field
fs/dlm/config.c:149:1:got int extern [toplevel] *noident

Signed-off-by: Harvey Harrison [EMAIL PROTECTED]
Signed-off-by: David Teigland [EMAIL PROTECTED]
---
 fs/dlm/config.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/dlm/config.c b/fs/dlm/config.c
index c3ad1df..7ceaea3 100644
--- a/fs/dlm/config.c
+++ b/fs/dlm/config.c
@@ -114,7 +114,7 @@ struct cluster_attribute {
 };
 
 static ssize_t cluster_set(struct cluster *cl, unsigned int *cl_field,
-  unsigned int *info_field, int check_zero,
+  int *info_field, int check_zero,
   const char *buf, size_t len)
 {
unsigned int x;
-- 
1.5.3.3

[Cluster-devel] [PATCH 2/6] dlm: make dlm_print_rsb() static

2008-04-15 Thread David Teigland

From: Adrian Bunk [EMAIL PROTECTED]

dlm_print_rsb() can now become static.

Signed-off-by: Adrian Bunk [EMAIL PROTECTED]
Signed-off-by: David Teigland [EMAIL PROTECTED]
---
 fs/dlm/lock.c |2 +-
 fs/dlm/lock.h |1 -
 2 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 8f250ac..1e9e8eb 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -165,7 +165,7 @@ void dlm_print_lkb(struct dlm_lkb *lkb)
   lkb-lkb_grmode, lkb-lkb_wait_type, lkb-lkb_ast_type);
 }
 
-void dlm_print_rsb(struct dlm_rsb *r)
+static void dlm_print_rsb(struct dlm_rsb *r)
 {
printk(KERN_ERR rsb: nodeid %d flags %lx first %x rlc %d name %s\n,
   r-res_nodeid, r-res_flags, r-res_first_lkid,
diff --git a/fs/dlm/lock.h b/fs/dlm/lock.h
index 05d9c82..88e93c8 100644
--- a/fs/dlm/lock.h
+++ b/fs/dlm/lock.h
@@ -13,7 +13,6 @@
 #ifndef __LOCK_DOT_H__
 #define __LOCK_DOT_H__
 
-void dlm_print_rsb(struct dlm_rsb *r);
 void dlm_dump_rsb(struct dlm_rsb *r);
 void dlm_print_lkb(struct dlm_lkb *lkb);
 void dlm_receive_message_saved(struct dlm_ls *ls, struct dlm_message *ms);
-- 
1.5.3.3

[Cluster-devel] [PATCH 4/6] dlm: recover nodes that are removed and re-added

2008-04-15 Thread David Teigland

If a node is removed from a lockspace, and then added back before the
dlm is notified of the removal, the dlm will not detect the removal
and won't clear the old state from the node.  This is fixed by using a
list of added nodes so the membership recovery can detect when a newly
added node is already in the member list.

Signed-off-by: David Teigland [EMAIL PROTECTED]
---
 fs/dlm/config.c   |   48 +++-
 fs/dlm/config.h   |3 ++-
 fs/dlm/dlm_internal.h |4 +++-
 fs/dlm/member.c   |   34 +-
 fs/dlm/recoverd.c |1 +
 5 files changed, 74 insertions(+), 16 deletions(-)

diff --git a/fs/dlm/config.c b/fs/dlm/config.c
index 7ceaea3..eac23bd 100644
--- a/fs/dlm/config.c
+++ b/fs/dlm/config.c
@@ -284,6 +284,7 @@ struct node {
struct list_head list; /* space-members */
int nodeid;
int weight;
+   int new;
 };
 
 static struct configfs_group_operations clusters_ops = {
@@ -565,6 +566,7 @@ static struct config_item *make_node(struct config_group 
*g, const char *name)
config_item_init_type_name(nd-item, name, node_type);
nd-nodeid = -1;
nd-weight = 1;  /* default weight of 1 if none is set */
+   nd-new = 1; /* set to 0 once it's been read by dlm_nodeid_list() */
 
mutex_lock(sp-members_lock);
list_add(nd-list, sp-members);
@@ -805,12 +807,13 @@ static void put_comm(struct comm *cm)
 }
 
 /* caller must free mem */
-int dlm_nodeid_list(char *lsname, int **ids_out)
+int dlm_nodeid_list(char *lsname, int **ids_out, int *ids_count_out,
+   int **new_out, int *new_count_out)
 {
struct space *sp;
struct node *nd;
-   int i = 0, rv = 0;
-   int *ids;
+   int i = 0, rv = 0, ids_count = 0, new_count = 0;
+   int *ids, *new;
 
sp = get_space(lsname);
if (!sp)
@@ -818,23 +821,50 @@ int dlm_nodeid_list(char *lsname, int **ids_out)
 
mutex_lock(sp-members_lock);
if (!sp-members_count) {
-   rv = 0;
+   rv = -EINVAL;
+   printk(KERN_ERR dlm: zero members_count\n);
goto out;
}
 
-   ids = kcalloc(sp-members_count, sizeof(int), GFP_KERNEL);
+   ids_count = sp-members_count;
+
+   ids = kcalloc(ids_count, sizeof(int), GFP_KERNEL);
if (!ids) {
rv = -ENOMEM;
goto out;
}
 
-   rv = sp-members_count;
-   list_for_each_entry(nd, sp-members, list)
+   list_for_each_entry(nd, sp-members, list) {
ids[i++] = nd-nodeid;
+   if (nd-new)
+   new_count++;
+   }
+
+   if (ids_count != i)
+   printk(KERN_ERR dlm: bad nodeid count %d %d\n, ids_count, i);
+
+   if (!new_count)
+   goto out_ids;
+
+   new = kcalloc(new_count, sizeof(int), GFP_KERNEL);
+   if (!new) {
+   kfree(ids);
+   rv = -ENOMEM;
+   goto out;
+   }
 
-   if (rv != i)
-   printk(bad nodeid count %d %d\n, rv, i);
+   i = 0;
+   list_for_each_entry(nd, sp-members, list) {
+   if (nd-new) {
+   new[i++] = nd-nodeid;
+   nd-new = 0;
+   }
+   }
+   *new_count_out = new_count;
+   *new_out = new;
 
+ out_ids:
+   *ids_count_out = ids_count;
*ids_out = ids;
  out:
mutex_unlock(sp-members_lock);
diff --git a/fs/dlm/config.h b/fs/dlm/config.h
index a3170fe..4f1d6fc 100644
--- a/fs/dlm/config.h
+++ b/fs/dlm/config.h
@@ -35,7 +35,8 @@ extern struct dlm_config_info dlm_config;
 int dlm_config_init(void);
 void dlm_config_exit(void);
 int dlm_node_weight(char *lsname, int nodeid);
-int dlm_nodeid_list(char *lsname, int **ids_out);
+int dlm_nodeid_list(char *lsname, int **ids_out, int *ids_count_out,
+   int **new_out, int *new_count_out);
 int dlm_nodeid_to_addr(int nodeid, struct sockaddr_storage *addr);
 int dlm_addr_to_nodeid(struct sockaddr_storage *addr, int *nodeid);
 int dlm_our_nodeid(void);
diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h
index d30ea8b..c70c8e5 100644
--- a/fs/dlm/dlm_internal.h
+++ b/fs/dlm/dlm_internal.h
@@ -133,8 +133,10 @@ struct dlm_member {
 
 struct dlm_recover {
struct list_headlist;
-   int *nodeids;
+   int *nodeids;   /* nodeids of all members */
int node_count;
+   int *new;   /* nodeids of new members */
+   int new_count;
uint64_tseq;
 };
 
diff --git a/fs/dlm/member.c b/fs/dlm/member.c
index fa17f5a..26133f0 100644
--- a/fs/dlm/member.c
+++ b/fs/dlm/member.c
@@ -210,6 +210,23 @@ int dlm_recover_members(struct dlm_ls *ls, struct 
dlm_recover *rv, int *neg_out)
}
}
 
+   /* Add an entry to ls_nodes_gone

Re: [Cluster-devel] [PATCH 5/6] dlm: move plock code from gfs2

2008-04-16 Thread David Teigland

On Wed, Apr 16, 2008 at 08:35:57AM -0500, David Teigland wrote:
 On Wed, Apr 16, 2008 at 05:53:43AM +0200, Fabio M. Di Nitto wrote:
  On Tue, 15 Apr 2008, David Teigland wrote:
  
  On Tue, Apr 15, 2008 at 04:02:26PM -0500, David Teigland wrote:
  Move the code that handles cluster posix locks from gfs2 into the dlm
  so that it can be used by both gfs2 and ocfs2.
  
  Attached is a patch to gfs_controld in STABLE2 to adapt to this change.
  Since the cluster STABLE2 branch is meant to run on the latest released
  kernel, this won't be commited until 2.6.26 is released.
  
  Looks like the patch is empty.
 
 thanks, second attempt

That attachment is empty too, baffling, here it is in the body,

Re: [Cluster-devel] Cluster Project branch, STABLE2, updated. cluster-2.03.02-7-ga6b6a30

2008-05-13 Thread David Teigland

On Tue, May 13, 2008 at 09:01:13PM +0100, Steven Whitehouse wrote:
 Hi,
 
 It might be a silly question, but this looks to me like trying to fix a
 kernel bug by adding a userland one. Why not simply update the kernel to
 return the correct value?

Yes, there's already a kernel fix in dlm.git, see
  dlm: fix plock dev_write return value

http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/teigland/dlm.git;a=shortlog;h=next

Suppressing the message spamming is a good solution in the mean time, and
has a better chance of getting to customers before the kernel patch.

Re: [Cluster-devel] Cluster Project branch, STABLE2, updated. cluster-2.03.02-7-ga6b6a30

2008-05-14 Thread David Teigland

On Wed, May 14, 2008 at 09:56:11AM +0100, Steven Whitehouse wrote:
 Hi,
 
 On Tue, 2008-05-13 at 15:13 -0500, David Teigland wrote:
  On Tue, May 13, 2008 at 09:01:13PM +0100, Steven Whitehouse wrote:
   Hi,
   
   It might be a silly question, but this looks to me like trying to fix a
   kernel bug by adding a userland one. Why not simply update the kernel to
   return the correct value?
  
  Yes, there's already a kernel fix in dlm.git, see
dlm: fix plock dev_write return value
  
  http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/teigland/dlm.git;a=shortlog;h=next
  
  Suppressing the message spamming is a good solution in the mean time, and
  has a better chance of getting to customers before the kernel patch.
  
 I'm afraid you've still not convinced me on this one. Why not just check
 errno as well as the return value, then you can detect both correct
 instances and still report errors when they occur.

I really didn't find it worth the time... there are lots of highly
unlikely error conditions that are just not worth logging a message about.
When I'm modifying that code again I'll look at putting back a check.

 Also the kernel fix doesn't look quite right to me. Surely we should be
 reporting the error from dlm_plock_callback() if it occurs, rather than
 just ignoring it?
 
 In dlm_plock_callback() the return value from notify() seems to be
 ignored in one case too.

Errors from notify() are the point where this whole scheme (plocks from
nfs) falls apart.  There's nothing that can be done to recover from an
error there, and the nfs people basically have to wait indefinitely for
the notify to make sure it never causes an error.  Logging the error is
the best we can do.

 I spotted this while looking at the code:
 
 struct plock_xop {
 struct plock_op xop;
 void *callback;
 void *fl;
 void *file;
 struct file_lock flc;
 };
 
 and I can't see the need for void pointers here, why not just use the
 correct types? It looks like this code could do with some cleanup,

I don't know, it doesn't look like they need to be void.  Marc Eshel
[EMAIL PROTECTED] is the one to ask, he added the support for nfs
plocks.

Re: [Cluster-devel] [PATCH] checking NULL pointer in device_write of dlm-control

2008-05-28 Thread David Teigland

On Wed, May 28, 2008 at 02:45:10PM +0900, Masatake YAMATO wrote:
 Hi,
 
 I found a way to let linux dereference NULL pointer
 in gfs2-2.6-nmw/fs/dlm/user.c. 
 
 If `device_write' method is called via dlm-control, 
 file-private_data is NULL. (See ctl_device_open() in 
 user.c. ) Through proc-flags is read:
 
   if ((kbuf-cmd == DLM_USER_LOCK || kbuf-cmd == DLM_USER_UNLOCK) 
   test_bit(DLM_PROC_FLAGS_CLOSING, proc-flags))
   return -EINVAL;

Thanks for the patch, I'll push it out shortly.

Dave

[Cluster-devel] Re: [Ocfs2-devel] [PATCH 0/3] ocfs2: move hb_ctl into stack glue

2008-06-02 Thread David Teigland

On Fri, May 30, 2008 at 05:36:41PM -0700, Joel Becker wrote:
 We have determined that ocfs2 can only leave a cluster group safely in
 put_super(7).  The presence of bind mounts, rbind mounts, and shared
 subtrees make tracking mountpoints impossible in userspace.
 
 To solve this, we move the ocfs2_hb_ctl call out of o2cb and into the
 generic stack glue code.  ocfs2_hb_ctl will always be called for all
 cluster stacks.  This should be compatible with old and new tools - o2cb
 behavior doesn't change.
 
 The code is also available on the 'hbctl-path' branch of my git
 repository.
 
 View:
 http://oss.oracle.com/git/?p=jlbec/linux-2.6.git;a=shortlog;h=hbctl-path
 Pull:
 git pull git://oss.oracle.com/git/jlbec/linux-2.6.git hbctl-path

This is also why I've removed the gfs umount helpers from master.

Dave

Re: [Cluster-devel] Cluster Project branch, master, updated. cluster-2.99.03-3-ge879971

2008-06-03 Thread David Teigland

On Tue, Jun 03, 2008 at 08:55:49AM +0200, Fabio M. Di Nitto wrote:
 On Mon, 2 Jun 2008, Joel Becker wrote:
 
 On Tue, Jun 03, 2008 at 05:57:35AM -, [EMAIL PROTECTED] wrote:
 commit e879971090c6821bb966f17875874d11aa740a5c
 Author: Fabio M. Di Nitto [EMAIL PROTECTED]
 Date:   Tue Jun 3 07:54:37 2008 +0200
 
 [MISC] Make several API's private again
 
 A bunch of API's have been exported and made public by mistake.
 
 libdlmcontrol, libfenced and libgfscontrol are now private again
 and no shared libraries are available.
 
  Um, is libdlmcontrol the method by which fs control daemons
 interact with dlm_controld?  If so, ocfs2_controld certainly needs that
 shared library.
 
 As long as David is ok with it.. i was explicitly told that those 
 libraries are for internal use only.
 
 David can we make a final call on those libs?

Yes, libdlmcontrol is meant to be usd by ocfs2_controld.  I still think it
needs to be kept private, or internal in some way to prevent someone
from trying to use it.

[Cluster-devel] cluster3 config system

2008-06-23 Thread David Teigland

It seems there's been some confusion about what the config system (ccs
replacement) should be and do in cluster3.  There were just two ideas I
offered way back at the beginning:

1. Move the update mechanism outside the config system.

2. What remains is the part that reads the local cluster.conf file.
   This small and simple remainder should be merged into cman.

Another idea that emerged was,

3. Add the option to get cluster.conf from an LDAP server.

My impression, from casually following the progress, was that 2 and 3 were
largely done, but I wasn't sure what the status of 1 was.

An illustration I've often used for 1, is manually scp cluster.conf to
all cluster nodes.  The point of the illustration is that the update
mechanism should be external to the config system.  I've always expected
that some higher level program would actually make it simpler than manual
scp.  I think it would be great if this higher level program were conga,
or one of conga's components.  If enough customers use conga, I really
think this should be the solution for auto-updates (manual updates via scp
would always be possible).  If too few customers use conga, then we may
need to write some new higher level program to do it via command line,
but it shouldn't be any more complex or less intuitive than scp to all
nodes.

There aren't any cluster2 (RHEL5) compatibily issues here.  cluster2 and
cluster3 nodes can coexist in a cluster just fine as long as they are
using matching cluster.conf files.

Re: [Cluster-devel] logsys in fenced

2008-06-25 Thread David Teigland

On Wed, Jun 25, 2008 at 06:19:00PM +0200, Fabio M. Di Nitto wrote:
 . Leave log_debug() unchanged,
 
 The only change is that it uses logsys to print instead fprintf to stderr. 
 Like Christine already pointed out, the change to logsys is to have log 
 output the same across the whole system. No exceptions. Debug is no 
 different from that,

I'm saying it is different.  My debugging capabilities are completely
separate from logging.  You're trying to redefine them for me, and I'm
declining.

 and with logsys you can set debug logs at runtime 
 instead of having to do manual things.

Sorry, that's not what I want.

  syslog/logsys are about logging to files.
 
 this is an assumption. logsys allows you to log to file, syslog and stderr 
 according to what you need.

That's fine for the log_error statements, not for debugging statements.

 What is wrong with collecting debugging info in a standard way?

For now, my only interest in logsys is as a replacement for syslog.  Once
that works, I'll consider changes to the debug system.

 and what's the gain to keep around a macro that does nothing vs calling 
 directly log_printf?

See my other mail, and you're venturing into coding style preferences that
are not relevant to logsys.

 . Finally, one gripe with logsys itself.  Here's syslog initialization:
 
 
 Discuss this with Steven.

Yes, I'll be studying the gory logsys details in an effort to propose some
more concrete api suggestions.

Re: [Cluster-devel] [RFC] Common cluster connection handler API

2008-06-27 Thread David Teigland

On Fri, Jun 27, 2008 at 08:19:36PM +0200, Fabio M. Di Nitto wrote:
 I was actually hoping that with no more ccsd there'd be no more
 connecting to ccs, but that's probably a topic for one of the ccs
 meetings...
 
 The only partial advantage you have, as i documented and wrote to 
 cluster-devel, is that if you are connected to cman and cman_is_active, 
 you are guaranteed 99.9% to connected to ccs without problems (only 
 reason for rejection would be lack of resources on the machine, but at 
 that point you have more serious issues to worry about).

Oops, sorry, I'm still not thinking straight about the new ccs... yeah,
that makes sense that if cman is up then ccs should be there, since both
are openais extensions.  I'm curious, after cman_init() succeeds, what
more does cman_is_active() mean?  In practice would cman_init() ever be
ok, but cman_is_active() not be ok?

[Cluster-devel] logsys in cluster3

2008-06-30 Thread David Teigland

Main points from the logsys discussion we had

- initialization: use functions instead of macros, don't always need
  logging to be working from the start of execution, can wait until
  function is called to start it, don't need notion of subsystems
  or per-source-file logging features

- configuration setup: big blocks of setup code are repeated and largely
  the same, make this less

- configuration changes: user should be able to change logging settings
  in cluster.conf, propagate new cluster.conf, ccs should notify programs
  of cluster.conf change, programs should reread whatever dynamic config
  settings they use (may just be the logging settings), programs should
  then change behavior according to the new settings

[Cluster-devel] Re: [PATCH] dlm: fix uninitialized variable for search_rsb_list callers

2008-06-30 Thread David Teigland

On Mon, Jun 30, 2008 at 07:59:14PM +0300, Benny Halevy wrote:
 gcc 4.3.0 correctly emits the following warning.
 search_rsb_list does not *r_ret if no dlm_rsb is found
 and _search_rsb may pass the uninitialized value upstream
 on the error path when both calls to search_rsb_list
 return non-zero error.
 
 The fix sets *r_ret to NULL on search_rsb_list's not-found path.

Added to dlm.git.
Thanks, Dave

Re: [Cluster-devel] logsys in cluster3

2008-06-30 Thread David Teigland

On Mon, Jun 30, 2008 at 06:38:48PM +0200, Fabio M. Di Nitto wrote:
 On Mon, 30 Jun 2008, David Teigland wrote:
 
 - configuration setup: big blocks of setup code are repeated and largely
  the same, make this less
 
 I will take care of this bit since i already done it.
 
 the api will look like:
 
 int gimme_logging_config_data(char *name, int debug)
 
 return 0 if ok 1 on failure
 char *name is the subsystem name as declared
 debug = 0 if no debug override is coming from cli or envvar, 1 if debug 
 has been forced by cli or envvar.
 
 I still would like to agree on be able to try to config logging as early 
 as possible and then try later if it fails tho.
 
 This problem would just disappear if we can agree on the other common 
 cluster connection bit. At that point, we configure once we connect and we 
 keep logging only the attempts to connect. Everybody is happy everafter ;)
 
  ccs should notify programs
  of cluster.conf change,
 
 cman will take care of this since ccs is just a plugin now and cman has 
 the API there and i see little gain to do it again. Ok?

OK, the details are still a little hazy for starting up a program.  When a
program starts up it needs to interact with cman, ccs and logsys, and all
three of those are somewhat interdependent.

- setup logsys nominally, so that the cman/ccs setup steps can do logging
  . if this fails, just go on

- connect to cman
  . if this fails, exit (hopefully the nominal logging above worked)

- wait for cman to be fully running
  . do we want everyone to put a finite loop around this?
  . if this fails, exit
  . keep the cman connection open as long as the program is running

- connect to ccs
  . could this fail even if cman is already ok above?  do we need a
retry loop here?
  . keep the ccs connection open as long as the program is running

- read from ccs the optional cluster.conf logging settings
  . if this fails, just go on
  . reconfigure logsys, replacing the nominal config in step 1

- as the program runs, ccs/cman notifications may arrive indicating
  that cluster.conf has changed. when one of these callbacks arrives:
  . reread the logsys config and modify logging behavior accordingly
  . reread any other dynamic cluster.conf settings
  . (I assume I poll on the ccs connection fd which tells me when there's
 a change?)

Is there anything missing?

Re: [Cluster-devel] logsys in cluster3

2008-07-01 Thread David Teigland

On Tue, Jul 01, 2008 at 07:01:42AM +0200, Fabio M. Di Nitto wrote:
 No. You can just install the callback and be done with it. The ccs fd was 
 never a real fd to poll.

OK, it's a cman callback.  If the ccs connection isn't really a
connection and if it's not necessary then can we get rid of the illusion?


 Given that this code is going to be re-implemented N times, I suggest 
 again to create a cluster/common/helpers with pre-built objects to just 
 include at linking time (note that we also share and duplicate a lot of 
 header files around and it was in my mind for sometime to create a 
 cluster/common/includes too)

Let's get the code in place and replicated first, before trying to factor
it.  And I don't think it needs to be too complicated.  Assuming I can
make my simplified logsys api work (really, we're trying to solve stuff in
parallel here that should be sequential)...


/* Read cluster.conf settings and convert them into logsys values.
   If no cluster.conf setting exists, the default that was used in
   logsys_init() is used. */

int read_ccs_logging(int *mode, int *facility, int *priority, char *file,
 int *debug)
{
*mode = MYPROG_DEFAULT_MODE;
*facility = MYPROG_DEFAULT_FACILITY;
*priority = MYPROG_DEFAULT_PRIORITY;
strcpy(file, MYPROG_DEFAULT_FILE);

/* Read settings from ccs to override the defaults above.
   (with some appropriate helper functions around ccs_get, the
   following can be pretty compact)

   mode from
   /cluster/logging/@to_stderr
   /cluster/logging/@to_syslog
   /cluster/logging/@to_file

   facility from
   /cluster/logging/@syslog_facility

   priority from
   /cluster/logging/[EMAIL PROTECTED]prog_name\]/@syslog_level

   file from
   /cluster/logging/@filename

   debug from
   /cluster/logging/[EMAIL PROTECTED]prog_name\]/@debug
*/
}

/* this function will also be called when we get a cman config-update event */

void setup_logging(int *prog_debug)
{
int mode, facility, priority;
char *file;

/* The debug setting is special, it's used by the program
   and not used to configure logsys. */

read_ccs_logging(mode, facility, priority, file, prog_debug);
logsys_conf(mode, facility, priority, file);
}

int main(int argc, char **argv)
{
logsys_init(MYPROG_DEFAULT_MODE, MYPROG_DEFAULT_FACILITY,
MYPROG_DEFAULT_PRIORITY, MYPROG_DEFAULT_FILE);

/* set up cman/ccs connections ... */

setup_logging(prog_debug);
}

Re: [Cluster-devel] logsys in cluster3

2008-07-01 Thread David Teigland

On Tue, Jul 01, 2008 at 12:21:21PM -0500, David Teigland wrote:
 (What doesn't work yet is the test 1 output
 following the logsys_init() call.)

was missing logsys_flush()


 +int logsys_init(char *name, int mode, int facility, int priority, char *file)
 +{
 + char *errstr;

Thinking about prefixing, when the mode is set to use syslog, syslog willl
add a prefix for us, otherwise we might want a prefix like this?

strncpy(logsys_loggers[0].subsys, name,
sizeof(logsys_loggers[0].subsys));

 + logsys_config_mode_set(mode);
 + logsys_config_facility_set(name, facility);
 + logsys_config_file_set(errstr, file);
 + _logsys_config_priority_set(0, priority);
 + if ((mode  LOG_MODE_BUFFER_BEFORE_CONFIG) == 0) {
 + _logsys_wthread_create();
 + }
 + return 0;

[Cluster-devel] fenced logsys/cman/ccs setup

2008-07-01 Thread David Teigland

Here's a patch for review that changes fenced to use the logsys api I
posted earlier.  It also includes some other minor changes related to ccs
and cman setup.  Untested, and can't be commited until the logsys api is
official.


commit 1ccaf002a5989a22cb14fde4dfc3d13260f6316f
Author: David Teigland [EMAIL PROTECTED]
Date:   Wed Jun 25 15:56:34 2008 -0500

fenced: use logsys

- Setup ccs connection once at the start and keep it open.
- Read logging configuration from ccs.
- Replace calls to syslog with calls to logsys.
- Direct debug statements to logsys.
- cman setup uses cman_is_active
- cman setup retries cman_init and cman_is_active

Signed-off-by: David Teigland [EMAIL PROTECTED]

diff --git a/fence/fenced/Makefile b/fence/fenced/Makefile
index 1e9bbc9..61ec989 100644
--- a/fence/fenced/Makefile
+++ b/fence/fenced/Makefile
@@ -15,7 +15,8 @@ OBJS= config.o \
group.o \
main.o \
member_cman.o \
-   recover.o
+   recover.o \
+   logging.o
 
 CFLAGS += -D_FILE_OFFSET_BITS=64
 CFLAGS += -I${ccsincdir} -I${cmanincdir} -I${fenceincdir} -I${openaisincdir}
diff --git a/fence/fenced/config.c b/fence/fenced/config.c
index 56b7b0e..f2a8372 100644
--- a/fence/fenced/config.c
+++ b/fence/fenced/config.c
@@ -1,7 +1,9 @@
 #include fd.h
 #include ccs.h
 
-static int open_ccs(void)
+static int ccs_handle;
+
+int setup_ccs(void)
 {
int i = 0, cd;
 
@@ -9,18 +11,62 @@ static int open_ccs(void)
sleep(1);
if (++i  9  !(i % 10))
log_error(connect to ccs error %d, 
- check ccsd or cluster status, cd);
+ check cluster status, cd);
+
+   /* FIXME: do we want this infinite? */
+   if (i  10)
+   break;
}
-   return cd;
+
+   if (cd  0)
+   return cd;
+
+   ccs_handle = cd;
+   return 0;
+}
+
+void read_ccs_name(char *path, char *name)
+{
+   char *str;
+   int error;
+
+   error = ccs_get(ccs_handle, path, str);
+   if (error || !str)
+   return;
+
+   strcpy(name, str);
+
+   free(str);
+}
+
+void read_ccs_yesno(char *path, int *yes, int *no)
+{
+   char *str;
+   int error;
+
+   *yes = 0;
+   *no = 0;
+
+   error = ccs_get(ccs_handle, path, str);
+   if (error || !str)
+   return;
+
+   if (!strcmp(str, yes))
+   *yes = 1;
+
+   else if (!strcmp(str, no))
+   *no = 1;
+
+   free(str);
 }
 
-static void read_ccs_int(int cd, char *path, int *config_val)
+void read_ccs_int(char *path, int *config_val)
 {
char *str;
int val;
int error;
 
-   error = ccs_get(cd, path, str);
+   error = ccs_get(ccs_handle, path, str);
if (error || !str)
return;
 
@@ -48,11 +94,8 @@ int read_ccs(struct fd *fd)
 {
char path[256];
char *str;
-   int error, cd, i = 0, count = 0;
-
-   cd = open_ccs();
-   if (cd  0)
-   return cd;
+   int error, i = 0, count = 0;
+   int cd = ccs_handle;
 
/* Our own nodename must be in cluster.conf before we're allowed to
   join the fence domain and then mount gfs; other nodes need this to
@@ -122,7 +165,6 @@ int read_ccs(struct fd *fd)
 
log_debug(added %d nodes from ccs, count);
  out:
-   ccs_disconnect(cd);
return 0;
 }
 
diff --git a/fence/fenced/fd.h b/fence/fenced/fd.h
index 5ef1756..c74da54 100644
--- a/fence/fenced/fd.h
+++ b/fence/fenced/fd.h
@@ -10,7 +10,6 @@
 #include errno.h
 #include string.h
 #include stdint.h
-#include syslog.h
 #include time.h
 #include sched.h
 #include sys/ioctl.h
@@ -24,6 +23,7 @@
 
 #include openais/saAis.h
 #include openais/cpg.h
+#include openais/service/logsys.h
 
 #include list.h
 #include linux_endian.h
@@ -58,6 +58,7 @@
 #define GROUP_LIBCPG3
 
 extern int daemon_debug_opt;
+extern int daemon_debug_logsys;
 extern int daemon_quit;
 extern struct list_head domains;
 extern int cman_quorate;
@@ -74,14 +75,17 @@ extern void daemon_dump_save(void);
 #define log_debug(fmt, args...) \
 do { \
snprintf(daemon_debug_buf, 255, %ld  fmt \n, time(NULL), ##args); \
-   if (daemon_debug_opt) fprintf(stderr, %s, daemon_debug_buf); \
daemon_dump_save(); \
+   if (daemon_debug_opt) \
+   fprintf(stderr, %s, daemon_debug_buf); \
+   if (daemon_debug_logsys) \
+   log_printf(LOG_DEBUG, %s, daemon_debug_buf); \
 } while (0)
 
 #define log_error(fmt, args...) \
 do { \
log_debug(fmt, ##args); \
-   syslog(LOG_ERR, fmt, ##args); \
+   log_printf(LOG_ERR, fmt, ##args); \
 } while (0)
 
 /* config option defaults */
@@ -210,6 +214,10 @@ struct fd {
 
 /* config.c */
 
+int setup_ccs(void);
+void read_ccs_name(char *path, char *name);
+void read_ccs_yesno(char *path, int *yes, int *no);
+void

Re: [Cluster-devel] fenced logsys/cman/ccs setup

2008-07-02 Thread David Teigland

On Wed, Jul 02, 2008 at 06:31:10AM +0200, Fabio M. Di Nitto wrote:
 @@ -9,18 +11,62 @@ static int open_ccs(void)
  sleep(1);
  if (++i  9  !(i % 10))
  log_error(connect to ccs error %d, 
 -  check ccsd or cluster status, cd);
 +  check cluster status, cd);
 +
 +/* FIXME: do we want this infinite? */
 +if (i  10)
 +break;
 
 I think we want this to be infinite (and consistent across the board) or 
 configurable the same way other daemons/tools do so that users can decide 
 how long to wait.

OK, I think the following logic supports that:

- During startup, a program first connects to cman and waits for it to be
  ready (this should be a consistent and finite retry period).

- Then the program connects to ccs.  I think we can assume that since cman
  is running, the ccs connection will work within a short time.  So,
  it's safe to just put an infinite loop around the ccs_connect.


 @@ -122,7 +165,6 @@ int read_ccs(struct fd *fd)
 
  log_debug(added %d nodes from ccs, count);
  out:
 -ccs_disconnect(cd);
  return 0;
 }
 
 I don't see any call to ccs_disconnect around. We still need to invoke it 
 on exit to close the connection to the objdb when we exit and release 
 aisexec resources.

OK, I need to rework the exit path for the daemon to close/cleanup things.
I've just been lazy in the past because an exit will almost always clean
everything up automatically anyway.


 +#define DEFAULT_FACILITYLOG_DAEMON
 
 You can either do:
 
 #define DEFAULT_FACILITY SYSLOGFACILITY
 
 or just use SYSLOGFACILITY instead.
 
 SYSLOGFACILITY is defined by the build system as default build option and 
 can be set by packagers according to distro rules/best practice/etc.

OK, that works for fenced.  What about things that aren't a daemon,
though?  Perhaps they'd like a different default?


 +#define DEFAULT_PRIORITYLOG_LEVEL_ERROR
 
 I don't have a DEFAULT_PRIORITY in the build system (maybe i can add it) 
 but it should be LOG_LEVEL_INFO. At least this is the best practise we 
 used so far.

OK, just to be clear, this setting is used:
- during startup before reading the user preferences in cluster.conf
- during running if the user has set no preference in cluster.conf

and the defaults should assume that the majority of people will never add
logging preferences to cluster.conf.  (Both because we picked good
defaults, and because they're lazy to look up and write the xml.)


 +#define DEFAULT_FILENULL
 
 #define DEFAULT_FILE LOGDIR /fenced.log
 
 LOGDIR is set by the build system (same reasons as SYSLOGFACILITY). We 
 want files by default consistently across the board.

I think that by default we should probably have all logging go to
/var/log/messages like it has in the past.  Or, if we're daring, maybe
have it all go to /var/log/cluster.log.  But, I really doubt that people
want all programs to log to different files.


 +#define LEVEL_PATH 
 /cluster/logging/[EMAIL PROTECTED]FENCED\]/@syslog_level
 +#define DEBUG_PATH 
 /cluster/logging/[EMAIL PROTECTED]FENCED\]/@debug
 
 I am kind of curious.. why do you add defines for some queries but not for 
 others?

I'm still in limbo about what coding style I like there; in this case it's
if the line goes over 80 chars.


 +/* Read cluster.conf settings and convert them into logsys values.
 +   If no cluster.conf setting exists, the default that was used in
 +   logsys_init() is used.
 +
 +   mode from
 +   /cluster/logging/@to_stderr
 +   /cluster/logging/@to_syslog
 +   /cluster/logging/@to_file
 +
 +   facility from
 +   /cluster/logging/@syslog_facility
 +
 +   priority from
 +   /cluster/logging/[EMAIL PROTECTED]prog_name\]/@syslog_level
 +
 +   file from
 +   /cluster/logging/@filename
 +
 +   debug from
 +   /cluster/logging/[EMAIL PROTECTED]prog_name\]/@debug
 +*/
 
 This logic is almost ok.
 
 You need to check for:
 
 /cluster/logging/@debug - global debug on/off switch.
 
 the other daemons respects globla debug setting if no subsystem debug 
 setting is available.

OK, so subsystem setting has priority over global setting.


 You also want to allow debugging to be set via cmdline and envvar.
 
 This allows a great control of debugging.
 
 The use cases are:
 
 - set /cluster/logging/@debug to on:
   all daemons across the whole cluster are in debug mode (and logging
   debug info). You don't need to add X subsystem lines to achieve this.

ok

 
 - /cluster/logging/[EMAIL PROTECTED]prog_name\]/@debug:
   users either want to debug only this system or disable debugging only
   for this subsystem across the whole cluster when master debug is on.

ok

 
 - cmdline debug:
   allows the user to enable debugging manually only on that specific
   daemon on that specific node.
 
 - envvar debug on (see for example CMAN_DEBUG or CCSD_DEBUG or QDISK_...):
   these are easy to set within init scripts

Re: [Cluster-devel] fenced logsys/cman/ccs setup

2008-07-02 Thread David Teigland

On Wed, Jul 02, 2008 at 10:49:05AM -0500, David Teigland wrote:
  #define DEFAULT_FILE LOGDIR /fenced.log
  
  LOGDIR is set by the build system (same reasons as SYSLOGFACILITY). We 
  want files by default consistently across the board.
 
 I think that by default we should probably have all logging go to
 /var/log/messages like it has in the past.  Or, if we're daring, maybe
 have it all go to /var/log/cluster.log.  But, I really doubt that people
 want all programs to log to different files.

Thinking more about this, debug logs would definately make sense in
per-program log files like fenced.log.  I was thinking about the error
messages we're switching from syslog to logsys.  So, how do we tell logsys
to use /var/log/messages for errors and a separate file for debug output?

Re: [Cluster-devel] fenced logsys/cman/ccs setup

2008-07-02 Thread David Teigland

On Wed, Jul 02, 2008 at 08:22:54PM +0200, Fabio M. Di Nitto wrote:
 I was thinking about the error
 messages we're switching from syslog to logsys.  So, how do we tell logsys
 to use /var/log/messages for errors and a separate file for debug output?
 
 Why do you need this? Remember that you are setting a syslog facility. 
 syslog doesn't necessarely send info locally and a certain facility can be 
 directed to a specific file (even remote).
 
 The normal /var/log/cluster/fenced.log will collect everything locally (if 
 enabled).

OK, I wasn't properly thinking about the modes.

Because our default mode is LOG_MODE_OUTPUT_SYSLOG_THREADED, the file
won't be used by default, and error messages will still go to
/var/log/messages (by logsys using syslog).

If someone edits cluster.conf and just adds cluster/logging/to_file=yes,
without setting cluster/logging/filename, then all log_printf's will go to
the default filename (in addition to other places if other modes are set).
This is the only occasion when the default filename will be used.

Next, to verify how setting/unsetting the modes works.

By default we have LOG_MODE_OUTPUT_SYSLOG_THREADED.
- errors go to /var/log/messages
- debug goes nowhere
- (if debug was set, debug would go to /var/log/messages)

If someone sets to_file=yes, it adds LOG_MODE_OUTPUT_FILE, resulting
in LOG_MODE_OUTPUT_SYSLOG_THREADED | LOG_MODE_OUTPUT_FILE.
- errors go to both /var/log/messages and /var/log/fenced.log
- debug goes nowhere
- (if debug was set, debug would go to both /var/log/messages and
  /var/log/fenced.log)

If someone sets to_file=yes to_syslog=no, it adds LOG_MODE_OUTPUT_FILE,
removes LOG_MODE_OUTPUT_SYSLOG_THREADED, leaving just LOG_MODE_OUTPUT_FILE.
- errors go to /var/log/fenced.log
- debug goes nowhere
- (if debug was set, debug would go to /var/log/fenced.log)


Say there's a normal setup where no logging is configured, and errors are
going to /var/log/messages.  Then a person wants to turn on debugging and
collect the debug messages in /var/log/fenced.log.  What combination of
to_file, to_syslog, filename, and debug will allow that?  I can't find
any, and I think that would be one of the most common things people would
want.

I don't want to call all this confusing, because I can't offer any better
suggestions, but...  (And we haven't added command line options or
environment variables to the picture yet.)

[Cluster-devel] cluster.git at fedorahosted.org

2008-07-22 Thread David Teigland

We're planning to move the cluster git tree to fedorahosted.org, to take
advantage of the CLA system in place there.  To push changes to the new
git tree you'll need to:

- Set up an account at https://admin.fedoraproject.org/accounts/user/new
- Sign the CLA on your new account page
- Request addition to the gitcluster group at
  https://admin.fedoraproject.org/accounts/group/view/gitcluster

Re: [Cluster-devel] [PATCH/RFC] Standardize on /etc/sysconfig/cluster for init script

2008-07-30 Thread David Teigland

On Mon, Jul 28, 2008 at 10:20:49AM +0200, Fabio M. Di Nitto wrote:
 
 Hi guys,
 
 I just noticed that we have a very inconsistent way to set init script 
 defaults by using /etc/sysconfig/{cman,cluster,scsi_reserve}.
 
 the patch in attachment is very simple and standardize everything to 
 /etc/sysconfig/cluster and retains backward compatibility.
 
 Please ACK or i will apply.

Doesn't it make most sense for the name of the sysconfig file to match the
name of the init script it corresponds to?  i.e. the config file for
init.d/cman would be sysconfig/cman?

Re: [Cluster-devel] When is fencing considered successful?

2008-10-29 Thread David Teigland

On Wed, Oct 29, 2008 at 11:12:41AM -0500, Kevin Anderson wrote:
 Hi all,
 
 Recently we had cluster customer where the fencing agent successfully
 powered off a node, but the script failed to power the node back on due
 to firmware changes on the fencing device.
 
 Question is whether we should consider this a successful fence event or
 does the agent have to complete everything?   The primary purpose of the
 fencing agent is to stop the node from participating in the cluster.  In
 this case, that was successful.  But, because the power on attempt
 failed, the fence agent reported failure and the cluster hung waiting
 for a followup action.  
 
 It would seem to me that the fence agent should report success in this
 case, and maybe post a warning message about the failure to power on.
 
 Thoughts?

Definately

Re: [Cluster-devel] cluster/logging settings

2008-10-30 Thread David Teigland

On Thu, Oct 30, 2008 at 07:20:21PM +0100, Fabio M. Di Nitto wrote:
 a,b,c,x,y,z are connections that are all *controlled independently*
 a is always on
 b is always on
 c is connected if debug=on
 x is connected if to_stderr=yes
 y is connected if to_syslog=yes
 z is connected if to_file=yes
 
 sourcesdestinations  (destination-specific options)
 
 error --a--|   |--x-- stderr
 warn  --b--|---|--y-- syslog (syslog_facility, syslog_level)
 debug --c--|   |--z-- file   (logfile)
 
 
 syslog_facility and syslog_level settings are only passed to syslog and do
 not effect connections in this picture at all.  Similarly, logfile is only
 relevant to the file output, and does not effect any connections.
 
 So, a,b,c can all be turned on, and if y is enabled, then syslog can be
 tuned to filter some of them out.  There's no way to selectively filter
 things out of file or stderr; if x or z are turned on, they will let
 everything pass that comes down the pipe (a,b,c).
 
 let's assume this:
 
 logt_print(LOG_CRIT, critblabla\n);
 logt_print(LOG_INFO, infoblabla\n);
 if(debug)
   logt_print(LOG_DEBUG, debugblabla\n);
 
 If I set log_level to LOG_INFO, I do expect to see both critblabla and 
 infoblabla on all selected outputs.

Note that you've said log_level, but it's actually syslog_level, which
might lead people to different conclusions about what the option does.

 If I set log_level to LOG_DEBUG, I clearly expect to see debugblabla 
 passing throgh as well.
 (except if filtering to syslog is enabled, but we already agreed on this
 as required feature so I won't mention it as special case anylonger).
 
 I often expect that enabling LOG_DEBUG as priority, it will also enable 
 debugging code in general and viceversa. If I set debug=on, I want to be 
 able to catch LOG_DEBUG automatically, because i assume that most of the 
 LOG_DEBUG is wrapped between if(debug) { } statements. I don't feel the 
 need to set two options to enable full debugging.

You don't need to.  Under my scheme, you set debug=on, and full debugging
appears in the log file where most people want it, and it doesn't appear
in syslog where most people wouldn't want it.  If someone *does* want all
debugging to appear in syslog, then they set debug=on and
syslog_level=debug to get it... (and once they see the result, they'll
change it back, because it's really not nice to see all that in
/var/log/messages.)

Under your scheme, syslog_level and debug change each other in confusing
and redundant ways.  By setting debug=on you automatically get all
debugging in syslog (in addition to the logfile usually), which is not
where we wanted it...  So we added yet another complication:
LOG_MODE_FILTER_DEBUG_FROM_SYSLOG, which counteracts the bad effects of
the debug/syslog_level interaction which we didn't really want in the
first place.  See how complicated this gets when you have one option
changing another one, but not quite the way you want, so you add another
flag to work around the unintended effects of one setting implicitly
changing another?  It's bad all around: just keep them independent and all
the pain goes away.

(We wouldn't need FILTER_DEBUG_FROM_SYSLOG in my scheme because you
control each source and destination point explicitly; you say exactly what
you want, and get it.)

 Now, clearly this is only my expectation and that's how I wrote 
 ccs_read_logging to act.
 
 Probably what are mixing up here are:
 - LOG_DEBUG for print operation != LOG_DEBUG for syslog.
 - debug=on != LOG_DEBUG.
 
 I don't have any objections to roll back and make debug=on different from 
 priority=LOG_DEBUG as long as all the others agree on what they want (and 
 both is not an option ;)).

Re: [Cluster-devel] cluster/logging settings

2008-11-04 Thread David Teigland

On Thu, Oct 30, 2008 at 11:26:14PM -0700, Steven Dake wrote:
 There are two types of messages.  Those intended for users/admins and
 those intended for developers.
 
 Both of these message types should always be recorded *somewhere*.  The
 entire concept of LOG_LEVEL_DEBUG is dubious to me.  If you want to
 stick with that symanetic and definition that is fine, but really a
 LOG_LEVEL_DEBUG means this message is for the developer.  These
 messages should be recorded and stored when a process segfaults, aborts
 due to assertion, or at administrative request.  Since the frequency of
 these messages is high there is no other option for recording them since
 they must _always_ be recorded for the purposes of debugging a field
 failure.  Recording to disk or syslog has significant performance
 impact.
 
 The only solution for these types of messages is to record them into a
 flight recorder buffer which can be dumped:
 1) at segv
 2) at sigabt
 3) at administrative request
 
 This is a fundamental difference in how we have approached logging
 debugging messages in the past but will lead to the ability to ensure we
 _always_ have debug trace data available instead of telling the
 user/admin Go turn on debug and hope you can reproduce that error and
 btw since 10k messages are logged your disk will fill up with
 irrelevant debug messages and your system will perform like mud.
 
 Logging these in memory is the only solution that I see as suitable and
 in all cases they should be filtered from any output source such as
 stderr, file, or syslog.

There's a difference between high volume trace debug data stored in
memory, and low volume informational debug data that can be easily written
to a file.  Both kinds of data can be useful.

My programs are simple enough that low volume informational debug data is
enough for me to identify and fix a problem.  So, low volume informational
data is all I produce.  It can be useful to write this data to a file.

Your program is complex enough that high volume trace debug data is
usually needed for you to identify and fix a problem.  So, high volume
trace data is all you produce.  This is too much data to write to a file
(by the running program).

So, we're using DEBUG to refer to different things.  We need to define
two different levels (just for clarity in this discussion):
. DEBUGLO is low volume informational data like I use
. DEBUGHI is high volume trace data like you use

DEBUGHI messages wouldn't ever be logged to files by the program while
running.  DEBUGLO messages could be, though, if the user configured it.
So, circling back around, how should a user configure DEBUGLO messages to
appear in syslog or a logfile?   In particular, what would they enter in
the cluster.conf logging/ section?  My suggestion is:

  syslog_level=foo
  logfile_level=bar

where foo and bar are one of the standard priority names in syslog.h.
So, if a user wanted DEBUGLO messages to appear in daemon.log, they'd set

  logging/daemon/logfile_level=debug

and if they wanted DEBUGLO messages to appear in /var/log/messages,

  logging/daemon/syslog_level=debug

(Note that debug means DEBUGLO here because DEBUGHI messages are only
saved in memory, not to files by a running program.)

There's another separate question I have about corosync, and that's
whether you could identify some limited number of messages that would be
appropriate for DEBUGLO?  They would be used by non-experts to do some
rough debugging of problems, and by experts to narrow down a problem
before digging into the high volume trace data.  I'd suggest that a good
starting point for DEBUGLO would be the data that openais has historically
put in /var/log/messages.  Data that helps you quickly triage a problem
(or verify that things are happening correctly) without stepping through
all the trace data.

Re: [Cluster-devel] cluster/logging settings

2008-11-04 Thread David Teigland

On Tue, Nov 04, 2008 at 02:58:47PM -0600, David Teigland wrote:
 the cluster.conf logging/ section?  My suggestion is:
 
   syslog_level=foo
   logfile_level=bar

FWIW, I'm not set on this if someone has a better suggestion.  I just want
something unambiguous.  debug=on has been shown to mean something
different to everyone.

Dave

Re: [Cluster-devel] logging: final call on configuration, output and implementation

2008-11-10 Thread David Teigland

On Mon, Nov 10, 2008 at 08:09:10PM +0100, Fabio M. Di Nitto wrote:
 Hi all,
 
 those logging threads have been going on for way too long. It's time to
 close them and make a final decision. This is a long email, so please
 take time to read it all.
 
 This is a recap of what I believe a user would like to see:
 
 - a consistent, easy and quick way to configure logging.
 - a reasonable default if nothing is specified.
 - a consistent, easy to read, output.

I like this.  Two minor points regarding the actual terminology; I'd like
to be a little more consistent and identify some keywords.  Right now
the word log is combined with other words in a bunch of ways (logging,
logger, logfile, syslog, log_foo).  How about:

. logging, logging_subsys (common keyword logging)
  for the config file section tags

- to_syslog, syslog_facility, syslog_priority (common keyword syslog)
  for every parameter related to syslog

- to_logfile, logfile, logfile_priority (common keyword logfile)
  for every parameter related to logfile

And then we have some values that are on/off and others that are
yes/no; let's pick one.

Dave

[Cluster-devel] Re: [DLM] Fix up memory alloc/kmap

2008-11-13 Thread David Teigland

On Thu, Nov 13, 2008 at 09:56:18AM +, Steven Whitehouse wrote:
 It is left as an exercise for the reader to consider whether its a bug
 that DLM isn't using highmem pages for its internal buffers (in which
 case we'd have to solve the allocation problem at kmap time), or whether
 its a bug that the kmap/kunmap pairs are there at all (and can thus be
 removed, which is the simpler solution) :-)

Thanks, I'll remove all the kmap/kunmap calls, then.

Re: [Cluster-devel] [RFC] Splitting cluster.git into separate projects/trees

2008-11-14 Thread David Teigland

On Fri, Nov 14, 2008 at 10:18:13AM +0100, Fabio M. Di Nitto wrote:
 At this point we haven't really settled how many (sub) project will be
 created out of this split. This will come once we agree how to split.

I like the third option as long as the number of new git trees doesn't
explode (obviously no one wants 10 new git trees.)  Not to get ahead of
you, but for my own curiosity I looked at what minimum number of git trees
I'd have to start juggling... it's not too bad, but more than this might
get out of hand.

dlm.git:
  libdlm, dlm_controld, libdlmcontrol, dlm_tool

fence.git:
  libfence, fenced, libfenced, fence_tool, fence_node

fence-agents.git:
  lots

cman.git:
  libcman, cman_tool, cmannotifyd, qdiskd, mkqdisk
  cluster/config/*
  move plugins into corosync tree?
  group_tool (groupd/libgroup won't exist, group_tool will just be a
  wrapper/shortcut for fence_tool/dlm_tool/gfs_control queries;
  maybe include queries of other related daemons, like ocfs2_controld?)

gfs2-utils.git:
  gfs_controld, libgfscontrol, gfs_control
  mount.gfs2, mount.gfs, libgfs, libgfs2
  gfs_debug, gfs_fsck, gfs_grow, gfs_jadd, gfs_mkfs, gfs_quota, gfs_tool
  gfs2_convert, gfs2_edit, gfs2_fsck, gfs2_mkfs, gfs2_quota, gfs2_tool

gfs-kernel.git:
  gfs.ko

rgmanager.git:

gnbd goes away
cmirror moves away

Re: [Ocfs2-devel] [Cluster-devel] [RFC] Splitting cluster.git into separate projects/trees

2008-11-14 Thread David Teigland

On Fri, Nov 14, 2008 at 10:11:00PM +0100, Andrew Beekhof wrote:
 I'd have thought fence.git and fence-agents.git in one and cman.git
 and rgmanager.git in another.
 But I may be missing some of the interdependencies.

I wouldn't mind either of those combinations.  Maybe rgmanager's last
stand will be in cluster.git anyway... if so, then it's not a factor.

I didn't have much reason for separating fence/fence-agents.  We're
planning on unifying it all anyway, even if the agents are done sooner.
And I don't think packaging/releasing agents separately should have much
bearing on the source tree?  (I've heard interest in putting agents in
their own package for Fedora.)

Dave

Re: [Cluster-devel] GFS2: Send useful information with uevent messages

2008-12-01 Thread David Teigland

On Thu, Nov 27, 2008 at 10:45:21AM +, Steven Whitehouse wrote:
 From 04b985e291c464092516d0d1a4387b866389a85d Mon Sep 17 00:00:00 2001
 From: Steven Whitehouse [EMAIL PROTECTED]
 Date: Thu, 27 Nov 2008 09:42:51 +
 Subject: [PATCH] GFS2: Send useful information with uevent messages

 In order to distinguish between two differing uevent messages
 and to avoid using the (racy) method of reading status from
 sysfs in future, this adds some status information to our
 uevent messages.

 Btw, before anybody says sysfs isn't racy, I'm aware of that,
 but the way that GFS2 was using it (send an ambiugous uevent and
 then expect the receiver to read sysfs to find out the status
 of the reported operation) was.

Not as long as gfs_controld tells gfs-kernel to recover journals one at a
time on a node, which is what it does to avoid this problem.

[Cluster-devel] Re: Groupd uevent clean up

2008-12-01 Thread David Teigland

On Fri, Nov 28, 2008 at 11:07:56AM +, Steven Whitehouse wrote:
 LOCKTABLE=clustername:fsname
 LOCKPROTO=[lock_dlm|lock_nolock]
 
 to avoid all the messy parsing of the initial event string. Also I've
 added come further information to the two change events, so that we
 now have:
 
 FIRSTMOUNT=Done
 
 when the first mounter has finished and:
 
 JID=journal_id
 RECOVERY=[Done|Failed]
 
 when recovery has finished. Is there anything else that is useful I
 wonder, while I'm adding new items here? I think I've covered the
 important bits anyway.

That's excellent, much better than the sysfs files.  We'll need to keep
the sysfs files around for a while, though, so we don't break the
user/kernel interface.

Dave

[Cluster-devel] gfs uevent and sysfs changes

2008-12-01 Thread David Teigland

Here are the compatibility aspects to the recent ideas about changes to
the user/kernel interface between gfs (1  2) and gfs_controld.

. gfs_controld can remove id from hostdata string in mount options

  - no compat issues AFAICT

. getting rid of id sysfs file from lock_dlm

  - new gfs_controld old gfs-kernel
old kernel provides both block and id sysfs files
new daemon looks for block instead of id in sysfs

  - old gfs_controld new gfs-kernel
old daemon looks for id sysfs file
new kernel needs to provide id as well as block sysfs files

  Once everyone is using the new daemon, we can remove the id sysfs
  file from the kernel.

. uevent strings to replace recover_done/recover_status sysfs files

  - new gfs_controld old gfs-kernel
old kernel has recover sysfs files, and no new uevent strings
new daemon needs to look for either sysfs files or uevent strings

  - old gfs_controld new gfs-kernel
old daemon looks for recover sysfs files, not new uevent strings
new kernel needs to provide both sysfs files and uevent strings

  Once everyone is using new kernel and new daemon, we can remove
  the recover sysfs files from kernel, and daemon can stop looking for
  recover sysfs files.

Re: [Cluster-devel] gfs uevent and sysfs changes

2008-12-04 Thread David Teigland

On Thu, Dec 04, 2008 at 01:32:31PM -0500, david m. richter wrote:
 On Mon, Dec 1, 2008 at 12:31 PM, David Teigland [EMAIL PROTECTED] wrote:
  Here are the compatibility aspects to the recent ideas about changes to
  the user/kernel interface between gfs (1  2) and gfs_controld.
 
  . gfs_controld can remove id from hostdata string in mount options
 
 hi david,
 
 I know I'm a peripheral consumer of the cluster suite, but I thought
 I'd chime in and say that I am currently using the id as passed into
 the kernel in the hostdata string (I believe by mount.gfs2?) in my
 pNFS work.  does the above gfs_controld can remove id from hostdata
 string comment refer to something orthogonal, or would it affect what
 gets stored in the superblock's hostdata at mount time?

yes

 ..hm, sorry, I don't have the code right in front of me, but is that
 id in the hostdata string the same thing as the mountgroup id?  if
 so, then my above worry about the hostdata string is moot, because if
 gfs_controld still has that info I can just make a downcall.

Yes, it's created in gfs_controld, and passed to mount.gfs via the
hostdata string which is then passed into the kernel during mount(2).

Previously, gfs-kernel (lock_dlm actually) would pass this id back up to
gfs_controld within the plock op structures.  This was because plock ops
for all gfs fs's were funnelled to gfs_controld through a single misc
device.  gfs_controld would match the op to a particular fs using the id.

The dlm does this now, using the lockspace id.

Dave

Re: [Cluster-devel] gfs uevent and sysfs changes

2008-12-05 Thread David Teigland

On Fri, Dec 05, 2008 at 09:51:45AM +, Steven Whitehouse wrote:
 In that case gfs2 should be able to generate the id itself from the
 fsname and it still doesn't need it passed in, even if it continues to
 expose the id in sysfs.
 
 Perhaps better still, it should be possible for David to generate the id
 directly if he really needs it from the fsname.

It's not actually a crc of the fsname, but a crc of the cpg name
gfs_controld creates for the mountgroup, which is gfs:mount:fsname.
Also, we may at some point want to allow that generated id to be overriden
by one that's set explicitly.

 worry about!), and I don't see that netlink should have any more
 overhead than any other method of sending messages.

netlink is painful compared to uevents, look at dlm_controld/netlink.c
which uses the generic netlink interface to transfer a data structure
from the kernel to userspace.  A library would help, but there didn't seem
to be a de facto netlink lib when I needed it, maybe that's changed.

Re: [Cluster-devel] gfs uevent and sysfs changes

2008-12-05 Thread David Teigland

On Fri, Dec 05, 2008 at 08:52:58AM -0600, David Teigland wrote:
 On Fri, Dec 05, 2008 at 09:51:45AM +, Steven Whitehouse wrote:
  In that case gfs2 should be able to generate the id itself from the
  fsname and it still doesn't need it passed in, even if it continues to
  expose the id in sysfs.
  
  Perhaps better still, it should be possible for David to generate the id
  directly if he really needs it from the fsname.
 
 It's not actually a crc of the fsname, but a crc of the cpg name
 gfs_controld creates for the mountgroup, which is gfs:mount:fsname.
 Also, we may at some point want to allow that generated id to be overriden
 by one that's set explicitly.

The fact that this id comes from gfs_controld, and becomes available only
during mount, makes me think it's not well suited to be the statfs fsid.
GFS should probably do it's own thing for statfs (like a hash of just the
fsname) instead of depending on gfs_controld for it.  With nolock the
daemons won't be there, and we'd still want the same fsid to be produced.

[Cluster-devel] Re: + fs-dlm-astc-fix-warning.patch added to -mm tree

2008-12-22 Thread David Teigland

On Mon, Dec 22, 2008 at 09:22:56AM +, Steven Whitehouse wrote:
  Cleans code up.
  
  Might be wrong.
  
  This is an O(n*n) search :(
  
 Thats true, but for fairly low values of n in general. Also the dlm
 locking will only be stopped for a lockspace in the case that we are in
 recovery, so that dlm_lock_stopped() is normally false so that this is
 basically iterating down the list, removing each item in turn.

Right, it just takes the first lock off the list, unless the lockspace is
being recovered which is seldom the case.  In the recovery case, it just
skips past the locks in the lockspace being recovered.  So, I would not
call this O(n*n).

I do like the patch, I hope to get it into the next queue by tomorrow.
Thanks,
Dave

Re: [Cluster-devel] Re: [PATCH 1/2] dlm: initialize file_lock struct in GETLK before copying conflicting lock

2009-01-22 Thread David Teigland

On Wed, Jan 21, 2009 at 06:42:39PM -0500, J. Bruce Fields wrote:
 On Wed, Jan 21, 2009 at 11:34:50AM -0500, Jeff Layton wrote:
  dlm_posix_get fills out the relevant fields in the file_lock before
  returning when there is a lock conflict, but doesn't clean out any of
  the other fields in the file_lock.
  
  When nfsd does a NFSv4 lockt call, it sets the fl_lmops to
  nfsd_posix_mng_ops before calling the lower fs. When the lock comes back
  after testing a lock on GFS2, it still has that field set. This confuses
  nfsd into thinking that the file_lock is a nfsd4 lock.
 
 I think of the lock system as supporting two types of objects, both
 stored in struct lock's:
 
   - Heavyweight locks: these have callbacks set and the filesystem
 or lock manager could in theory have some private data
 associated with them, so it's important that the appropriate
 callbacks be called when they're released or copied.  These
 are what are actually passed to posix_lock_file() and kept on
 the inode lock lists.
   - Lightweight locks: just start, end, pid, flags, and type, with
 everything zeroed out and/or ignored.
 
 I don't see any reason why the lock passed into dlm_posix_get() needs to
 be a heavyweight lock.  In any case, if it were, then dlm_posix_get()
 would need to release the passed-in-lock before initializing the new one
 that it's returning.

It seems the nfs code is mixing those two types up a bit.  Regardless, the
rationale I see in Jeff's dlm patch is to make the two different locking paths
equivalent:

Without cfs/dlm,
nfsd4_lockt - nfsd_test_lock - vfs_test_lock - posix_test_lock

With cfs/dlm,
nfsd4_lockt - nfsd_test_lock - vfs_test_lock - (cfs) - dlm_posix_get

When there's a conflict, dlm_posix_get() and posix_test_lock() should do the
same/equivalent things to the fl they are given.

posix_test_lock() does __locks_copy_lock() on the fl and then sets the pid.
dlm_posix_get() isn't using __locks_copy_lock() because it doesn't have a
conflicting file_lock to copy from.  Jeff's patch does nearly the same thing
using locks_init_lock() plus the existing assignments.  But, I think the best
solution may be for dlm_posix_get() to set up a new lightweight file_lock with
the values we need, and then call __locks_copy_lock() with it, just like
posix_test_lock().

Dave

Re: [Cluster-devel] [PATCH] dlm: Allow large nodeids

2009-01-27 Thread David Teigland

On Tue, Jan 27, 2009 at 02:06:30PM -0600, David Teigland wrote:
 On Tue, Jan 27, 2009 at 11:33:30AM +, Chrissie Caulfield wrote:
  This an updated patch that uses hlists rather than list_heads to save
  memory in the connection structure.
  
  Thanks to Steven Whitehouse for the suggestion.
 
 I fixed some checkpatch warnings, tested, and pushed into the next branch.

I take that back after hitting the following on unmount,

Pid: 4484, comm: umount Not tainted 2.6.29-rc2 #1
RIP: 0010:[a04ecfb4]  [a04ecfb4] foreach_conn+0x20/0x46 
[dlm]
RSP: 0018:880072db5d38  EFLAGS: 00010202
RAX: 0001 RBX: 6b6b6b6b6b6b6b6b RCX: 
RDX: a04ed0dc RSI: 006b RDI: 880057998de0
RBP: 880072db5d58 R08:  R09: 880057998de8
R10:  R11: 88007dd428d8 R12: 
R13: a04ecede R14: 6000 R15: 0100
FS:  7fbce8f0b720() GS:80a33080() knlGS:f7f7a6c0
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7ff8aa8d38e8 CR3: 000138c4a000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process umount (pid: 4484, threadinfo 880072db4000, task 8800738d4740)
Stack:
 88007d187000  88007d187000 88007c145fa0
 880072db5d68 a04ed35a 880072db5d78 a04eaf20
 880072db5db8 a04eb299 880072db5da8 88007e85e198
Call Trace:
 [a04ed35a] dlm_lowcomms_stop+0x68/0x82 [dlm]
 [a04eaf20] threads_stop+0xe/0x15 [dlm]
 [a04eb299] dlm_release_lockspace+0x372/0x3a4 [dlm]
 [a02720e0] gdlm_unmount+0x28/0x49 [lock_dlm]
 [a047270f] gfs2_unmount_lockproto+0x2d/0x52 [gfs2]
 [a0476bcc] gfs2_lm_unmount+0x16/0x18 [gfs2]
 [a047afb7] gfs2_put_super+0x180/0x190 [gfs2]
 [802afadc] generic_shutdown_super+0x73/0xe8
 [802afb73] kill_block_super+0x22/0x3a
 [a0476953] gfs2_kill_sb+0x63/0x78 [gfs2]
 [802afc5c] deactivate_super+0x68/0x7d
 [802c2aaf] mntput_no_expire+0x103/0x149
 [802c3094] sys_umount+0x2e2/0x341
 [8020c05b] system_call_fastpath+0x16/0x1b
Code: 23 fe df 48 89 d8 5b 41 5c c9 c3 55 48 89 e5 41 55 49 89 fd 41 54 45 31 
e4 53 48 83 ec 08 4a 8b 1c e5 e0 79 50 a0 48 85 db 74 15 48 8b 03 48 8d bb d0 
fe ff ff 0f 18 08 41 ff d5 48 8b 1b eb e6
RIP  [a04ecfb4] foreach_conn+0x20/0x46 [dlm]
 RSP 880072db5d38

[Cluster-devel] cluster3 logging config

2009-02-20 Thread David Teigland

I have a suggestion to improve our logging config.  The format below is the
default configuration (more or less, the corosync systems aren't sending
anything to syslog, but cman.log seems full of info-like stuff, but it's
beside the point):

   logging to_syslog=yes to_logfile=yes syslog_facility=daemon
syslog_priority=info logfile_priority=info
   logging_subsys=qdiskd
logfile=/var/log/cluster/qdisk.log/
   logging_subsys=groupd
logfile=/var/log/cluster/groupd.log/
   logging_subsys=fenced
logfile=/var/log/cluster/fenced.log/
   logging_subsys=dlm_controld
logfile=/var/log/cluster/dlm_controld.log/
   logging_subsys=gfs_controld
logfile=/var/log/cluster/gfs_controld.log/
   logging_subsys=rgmanager
logfile=/var/log/cluster/rgmanager.log/
   logging_subsys=CLM
logfile=/var/log/cluster/cman.log/
   logging_subsys=CPG
logfile=/var/log/cluster/cman.log/
   logging_subsys=MAIN
logfile=/var/log/cluster/cman.log/
   logging_subsys=SERV
logfile=/var/log/cluster/cman.log/
   logging_subsys=CMAN
logfile=/var/log/cluster/cman.log/
   logging_subsys=TOTEM
logfile=/var/log/cluster/cman.log/
   logging_subsys=QUORUM
logfile=/var/log/cluster/cman.log/
   logging_subsys=CONFDB
logfile=/var/log/cluster/cman.log/
   logging_subsys=CONFDB
logfile=/var/log/cluster/cman.log/
   /logging

Now, I just realized that I've missed some corosync subsystems, EVT, and CKPT
is probably one?, and maybe some others, I don't know.  The point is, to make
a change to corosync in general, a user has to go and list every single one
of these things, repeating the same info for each.  That's a big pain, and
definately not intuitive.  I realize it can be useful to enable debugging for
select corosync subsystems, so that should still be possible.

I suggest the following, notice the final corosync entry,

   logging to_syslog=yes to_logfile=yes syslog_facility=daemon
syslog_priority=info logfile_priority=info
   logging_daemon=qdiskd
logfile=/var/log/cluster/qdisk.log/
   logging_daemon=groupd
logfile=/var/log/cluster/groupd.log/
   logging_daemon=fenced
logfile=/var/log/cluster/fenced.log/
   logging_daemon=dlm_controld
logfile=/var/log/cluster/dlm_controld.log/
   logging_daemon=gfs_controld
logfile=/var/log/cluster/gfs_controld.log/
   logging_daemon=rgmanager
logfile=/var/log/cluster/rgmanager.log/
   logging_daemon=corosync
logfile=/var/log/cluster/corosync.log/
   /logging

the corosync entry would apply to *all* corosync subsystems by default.
We can still allow per-subsystem configuration,

   logging_daemon=corosync subsys=QUORUM
logfile=/var/log/cluster/corosync-quorum.log/
   logging_daemon=corosync subsys=TOTEM
logfile=/var/log/cluster/corosync-totem.log/
   ...

Re: [Cluster-devel] cluster3 logging config

2009-02-20 Thread David Teigland

On Fri, Feb 20, 2009 at 10:00:03AM -0600, David Teigland wrote:
 I suggest the following, notice the final corosync entry,
 
logging to_syslog=yes to_logfile=yes syslog_facility=daemon
 syslog_priority=info logfile_priority=info
logging_daemon=qdiskd
 logfile=/var/log/cluster/qdisk.log/
logging_daemon=groupd
 logfile=/var/log/cluster/groupd.log/
logging_daemon=fenced
 logfile=/var/log/cluster/fenced.log/
logging_daemon=dlm_controld
 logfile=/var/log/cluster/dlm_controld.log/
logging_daemon=gfs_controld
 logfile=/var/log/cluster/gfs_controld.log/
logging_daemon=rgmanager
 logfile=/var/log/cluster/rgmanager.log/
logging_daemon=corosync
 logfile=/var/log/cluster/corosync.log/
/logging
 
 the corosync entry would apply to *all* corosync subsystems by default.
 We can still allow per-subsystem configuration,
 
logging_daemon=corosync subsys=QUORUM
 logfile=/var/log/cluster/corosync-quorum.log/
logging_daemon=corosync subsys=TOTEM
 logfile=/var/log/cluster/corosync-totem.log/
...

Correcting the XML,

  logging_daemon name=qdiskd
   logfile=/var/log/cluster/qdisk.log/
  ...

  logging_daemon name=corosync subsys=QUORUM
   logfile=/var/log/cluster/corosync-quorum.log/

Re: [Cluster-devel] unfencing

2009-02-23 Thread David Teigland

On Mon, Feb 23, 2009 at 07:52:55PM +0100, Fabio M. Di Nitto wrote:
  A node unfences *itself* when it boots up.  As such, power-unfencing doesn't
  make sense; unfencing is only meant to reverse storage fencing.
 
 What can stop a user to run fence_node -U from another node to do remote
 (un)fencing?

It would work.  Users can do anything they like, that's beside the point.

The point is to make storage fencing more practical by automating storage
unfencing.  Otherwise, users have to invent ad hoc methods of doing it
themselves, often manually.  And, we end up solving the problem in painful,
one-off cases like scsi_reserve/fence_scsi, which cry out for a better
approach.

 How do we address the problem of nodes booting from that same shared
 storage?

Use power fencing (that's not the problem I'm trying to solve.)

Dave

Re: [Cluster-devel] unfencing

2009-02-23 Thread David Teigland

On Mon, Feb 23, 2009 at 01:36:04PM -0600, Ryan O'Hara wrote:
 What happens if unfencing fails? Is it safe to say that a node that
 fails to unfence itself will be prohibited from joining the fence
 domain? This is important for fence_scsi, since unfencing is
 equivalient to re-registering with the scsi devices. Failure to
 unfence (ie. register) precludes that node from being able to fence
 other nodes. I'm not sure if other fencing methods have this type of
 requirement.

Good point, it would probably be as simple as init.d/cman exiting with a
failure if fence_node -U fails.

Dave

Re: [Cluster-devel] unfencing

2009-02-23 Thread David Teigland

On Mon, Feb 23, 2009 at 02:24:13PM -0600, Ryan O'Hara wrote:
 On Mon, Feb 23, 2009 at 01:09:58PM -0600, David Teigland wrote:
  On Mon, Feb 23, 2009 at 07:52:55PM +0100, Fabio M. Di Nitto wrote:
   What can stop a user to run fence_node -U from another node to do remote
   (un)fencing?
  
  It would work.  Users can do anything they like, that's beside the point.
 
 It would not work for scsi reservations. With scsi reservations, an
 unfence operation is as simple a registering with the device(s). It
 cannot be done remotely. A registration exists on an IT nexus; the
 relationship between initiator and target. Bottom line is that a
 remote node cannot register another node --- the registration
 (sg_persist command) has to be run on the node that wants to unfence
 itself.

OK, thanks, that's good to keep in mind.  The other scheme I mentioned
originally where *other* nodes would unfence a node (instead of
self-unfencing) wouldn't work for scsi.

Dave

Re: [Cluster-devel] unfencing

2009-02-26 Thread David Teigland

On Thu, Feb 26, 2009 at 07:51:57AM +0100, Fabio M. Di Nitto wrote:
 On Mon, 2009-02-23 at 13:09 -0600, David Teigland wrote:
  On Mon, Feb 23, 2009 at 07:52:55PM +0100, Fabio M. Di Nitto wrote:
A node unfences *itself* when it boots up.  As such, power-unfencing 
doesn't
make sense; unfencing is only meant to reverse storage fencing.
   
   What can stop a user to run fence_node -U from another node to do remote
   (un)fencing?
  
  It would work.  Users can do anything they like, that's beside the point.
 
 I was thinking about 2 little points..
 
 Given the time at which fence_node -U will fire, you probably want to
 add a cman_init + cman_is_active + cman_finish loop in fence_node to
 make sure cman is ready to reply to our ccs queries, otherwise we might
 have a race condition at boot time (it might be already there.. didn't
 really check the code). All our daemons do that to give cman time to
 bootstrap.

Yes, good point.  I wonder if we'd be better off having cman_tool join
effectively do an is_active wait before exiting?  Then we could probably
avoid doing it many other places.  (It's also annoying when corosync crashes
after is_active completes, but before I've read what I need from cman/ccs.)

 The second thing would be to set a minimal protection mechanism by
 allowing fence_node -U to be fired only for the node that it is invoking
 it. So if we run on node A, fence_node -U can only execute unfencing
 operations for node A. For testing purposes then we could add a manual
 override such as --i-understand-this-operation-can-destroy-the-world.

I plan to use fence_node -U (no name) to unfence self.  I'm inclined to
just allow any node name after that, but not advertise it.

Re: [Cluster-devel] [PATCH] dlm: Allow large nodeids

2009-03-06 Thread David Teigland

On Wed, Jan 28, 2009 at 11:27:35AM +, Chrissie Caulfield wrote:
 David Teigland wrote:
  On Tue, Jan 27, 2009 at 02:06:30PM -0600, David Teigland wrote:
  On Tue, Jan 27, 2009 at 11:33:30AM +, Chrissie Caulfield wrote:
  This an updated patch that uses hlists rather than list_heads to save
  memory in the connection structure.

This patch (with fix) seems to cause the following about half of the time when
killing dlm_controld:

dlm: x: leaving the lockspace group...
dlm: x: group event done 0 0
dlm: x: release_lockspace final free
dlm: closing connection to node 1
general protection fault:  [#1] SMP
last sysfs file: /sys/kernel/dlm/x/event_done
CPU 1
Modules linked in: lock_dlm dlm gfs2 configfs autofs4 sunrpc ipv6 cpufreq_ondema
nd dm_multipath video output sbs sbshc battery ac parport_pc lp parport sg butto
n serio_raw tg3 libphy i2c_nforce2 i2c_core pcspkr dm_snapshot dm_zero dm_mirror
 dm_region_hash dm_log dm_mod qla2xxx scsi_transport_fc shpchp mptspi mptscsih m
ptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 10416, comm: dlm_controld Not tainted 2.6.29-rc2 #1
RIP: 0010:[a045116a]  [a045116a] __find_con+0x17/0x35 [dlm]
RSP: 0018:88007b189da8  EFLAGS: 00010202
RAX: 880078ccfde8 RBX: 0001 RCX: 6b6b6b6b6b6b6b6b
RDX: 6b6b6b6b6b6b6b6b RSI: 0022 RDI: 0001
RBP: 88007b189da8 R08:  R09: 88007b189d48
R10:  R11:  R12: 
R13: 0001 R14: a0462960 R15: 88007dd52de0
FS:  7f71554c06e0() GS:88007f682210() knlGS:f7ef76c0
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7f111c3ce000 CR3: 7e92a000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process dlm_controld (pid: 10416, threadinfo 88007b188000, task 88007e47
83c0)
Stack:
 88007b189dd8 a04514ea a026d61f 0001
 880078d12b50 a04629d0 88007b189df8 a045169c
 88007b1935f8 880078d12b50 88007b189e18 a0446921
Call Trace:
 [a04514ea] nodeid2con+0x29/0x1b7 [dlm]
 [a026d61f] ? configfs_rmdir+0x203/0x277 [configfs]
 [a045169c] dlm_lowcomms_close+0x24/0x48 [dlm]
 [a0446921] drop_comm+0x29/0x55 [dlm]
 [a026be0c] client_drop_item+0x25/0x31 [configfs]
 [a026d63d] configfs_rmdir+0x221/0x277 [configfs]
 [804d0609] ? _spin_unlock+0x26/0x2a
 [802b5ca9] vfs_rmdir+0xc5/0x137
 [802b7c00] do_rmdir+0xb5/0x107
 [8026f0a0] ? audit_syscall_entry+0x16b/0x19e
 [802b7c89] sys_rmdir+0x11/0x13
 [8020c05b] system_call_fastpath+0x16/0x1b
Code: c7 80 34 46 a0 31 db e8 b1 d9 07 e0 48 89 d8 5b 41 5c c9 c3 48 89 f8 55 83
 e0 1f 48 8b 14 c5 e0 bb 46 a0 48 89 e5 48 85 d2 74 1a 39 ba d8 fe ff ff 48 8b
 0a 48 8d 82 d0 fe ff ff 0f 18 09 74 07
RIP  [a045116a] __find_con+0x17/0x35 [dlm]
 RSP 88007b189da8

[Cluster-devel] [PATCH 2/7] dlm: use ipv6_addr_copy

2009-03-24 Thread David Teigland

From: Joe Perches j...@perches.com

Signed-off-by: Joe Perches j...@perches.com
Signed-off-by: David Teigland teigl...@redhat.com
---
 fs/dlm/lowcomms.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 103a5eb..bf09262 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -53,6 +53,7 @@
 #include linux/mutex.h
 #include linux/sctp.h
 #include net/sctp/user.h
+#include net/ipv6.h
 
 #include dlm_internal.h
 #include lowcomms.h
@@ -250,8 +251,7 @@ static int nodeid_to_addr(int nodeid, struct sockaddr 
*retaddr)
} else {
struct sockaddr_in6 *in6  = (struct sockaddr_in6 *) addr;
struct sockaddr_in6 *ret6 = (struct sockaddr_in6 *) retaddr;
-   memcpy(ret6-sin6_addr, in6-sin6_addr,
-  sizeof(in6-sin6_addr));
+   ipv6_addr_copy(ret6-sin6_addr, in6-sin6_addr);
}
 
return 0;
-- 
1.5.5.6

[Cluster-devel] [PATCH 4/7] dlm: replace idr with hash table for connections

2009-03-24 Thread David Teigland

From: Christine Caulfield ccaul...@redhat.com

Integer nodeids can be too large for the idr code; use a hash
table instead.

Signed-off-by: Christine Caulfield ccaul...@redhat.com
Signed-off-by: David Teigland teigl...@redhat.com
---
 fs/dlm/lowcomms.c |  171 
 1 files changed, 92 insertions(+), 79 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 982314c..609108a 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -2,7 +2,7 @@
 ***
 **
 **  Copyright (C) Sistina Software, Inc.  1997-2003  All rights reserved.
-**  Copyright (C) 2004-2007 Red Hat, Inc.  All rights reserved.
+**  Copyright (C) 2004-2009 Red Hat, Inc.  All rights reserved.
 **
 **  This copyrighted material is made available to anyone wishing to use,
 **  modify, copy, or redistribute it subject to the terms and conditions
@@ -48,7 +48,6 @@
 #include net/sock.h
 #include net/tcp.h
 #include linux/pagemap.h
-#include linux/idr.h
 #include linux/file.h
 #include linux/mutex.h
 #include linux/sctp.h
@@ -61,6 +60,7 @@
 #include config.h
 
 #define NEEDED_RMEM (4*1024*1024)
+#define CONN_HASH_SIZE 32
 
 struct cbuf {
unsigned int base;
@@ -115,6 +115,7 @@ struct connection {
int retries;
 #define MAX_CONNECT_RETRIES 3
int sctp_assoc;
+   struct hlist_node list;
struct connection *othercon;
struct work_struct rwork; /* Receive workqueue */
struct work_struct swork; /* Send workqueue */
@@ -139,14 +140,37 @@ static int dlm_local_count;
 static struct workqueue_struct *recv_workqueue;
 static struct workqueue_struct *send_workqueue;
 
-static DEFINE_IDR(connections_idr);
+static struct hlist_head connection_hash[CONN_HASH_SIZE];
 static DEFINE_MUTEX(connections_lock);
-static int max_nodeid;
 static struct kmem_cache *con_cache;
 
 static void process_recv_sockets(struct work_struct *work);
 static void process_send_sockets(struct work_struct *work);
 
+
+/* This is deliberately very simple because most clusters have simple
+   sequential nodeids, so we should be able to go straight to a connection
+   struct in the array */
+static inline int nodeid_hash(int nodeid)
+{
+   return nodeid  (CONN_HASH_SIZE-1);
+}
+
+static struct connection *__find_con(int nodeid)
+{
+   int r;
+   struct hlist_node *h;
+   struct connection *con;
+
+   r = nodeid_hash(nodeid);
+
+   hlist_for_each_entry(con, h, connection_hash[r], list) {
+   if (con-nodeid == nodeid)
+   return con;
+   }
+   return NULL;
+}
+
 /*
  * If 'allocation' is zero then we don't attempt to create a new
  * connection structure for this node.
@@ -155,31 +179,17 @@ static struct connection *__nodeid2con(int nodeid, gfp_t 
alloc)
 {
struct connection *con = NULL;
int r;
-   int n;
 
-   con = idr_find(connections_idr, nodeid);
+   con = __find_con(nodeid);
if (con || !alloc)
return con;
 
-   r = idr_pre_get(connections_idr, alloc);
-   if (!r)
-   return NULL;
-
con = kmem_cache_zalloc(con_cache, alloc);
if (!con)
return NULL;
 
-   r = idr_get_new_above(connections_idr, con, nodeid, n);
-   if (r) {
-   kmem_cache_free(con_cache, con);
-   return NULL;
-   }
-
-   if (n != nodeid) {
-   idr_remove(connections_idr, n);
-   kmem_cache_free(con_cache, con);
-   return NULL;
-   }
+   r = nodeid_hash(nodeid);
+   hlist_add_head(con-list, connection_hash[r]);
 
con-nodeid = nodeid;
mutex_init(con-sock_mutex);
@@ -190,19 +200,30 @@ static struct connection *__nodeid2con(int nodeid, gfp_t 
alloc)
 
/* Setup action pointers for child sockets */
if (con-nodeid) {
-   struct connection *zerocon = idr_find(connections_idr, 0);
+   struct connection *zerocon = __find_con(0);
 
con-connect_action = zerocon-connect_action;
if (!con-rx_action)
con-rx_action = zerocon-rx_action;
}
 
-   if (nodeid  max_nodeid)
-   max_nodeid = nodeid;
-
return con;
 }
 
+/* Loop round all connections */
+static void foreach_conn(void (*conn_func)(struct connection *c))
+{
+   int i;
+   struct hlist_node *h, *n;
+   struct connection *con;
+
+   for (i = 0; i  CONN_HASH_SIZE; i++) {
+   hlist_for_each_entry_safe(con, h, n, connection_hash[i], list){
+   conn_func(con);
+   }
+   }
+}
+
 static struct connection *nodeid2con(int nodeid, gfp_t allocation)
 {
struct connection *con;
@@ -218,14 +239,17 @@ static struct connection *nodeid2con(int nodeid, gfp_t 
allocation)
 static struct connection *assoc2con(int assoc_id)
 {
int i;
+   struct

[Cluster-devel] [PATCH 1/7] dlm: Change rwlock which is only used in write mode to a spinlock

2009-03-24 Thread David Teigland

From: Steven Whitehouse swhit...@redhat.com

The ls_dirtbl[].lock was an rwlock, but since it was only used in write
mode a spinlock will suffice.

Signed-off-by: Steven Whitehouse swhit...@redhat.com
Signed-off-by: David Teigland teigl...@redhat.com
---
 fs/dlm/dir.c  |   18 +-
 fs/dlm/dlm_internal.h |2 +-
 fs/dlm/lockspace.c|2 +-
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/dlm/dir.c b/fs/dlm/dir.c
index 92969f8..858fba1 100644
--- a/fs/dlm/dir.c
+++ b/fs/dlm/dir.c
@@ -156,7 +156,7 @@ void dlm_dir_remove_entry(struct dlm_ls *ls, int nodeid, 
char *name, int namelen
 
bucket = dir_hash(ls, name, namelen);
 
-   write_lock(ls-ls_dirtbl[bucket].lock);
+   spin_lock(ls-ls_dirtbl[bucket].lock);
 
de = search_bucket(ls, name, namelen, bucket);
 
@@ -173,7 +173,7 @@ void dlm_dir_remove_entry(struct dlm_ls *ls, int nodeid, 
char *name, int namelen
list_del(de-list);
kfree(de);
  out:
-   write_unlock(ls-ls_dirtbl[bucket].lock);
+   spin_unlock(ls-ls_dirtbl[bucket].lock);
 }
 
 void dlm_dir_clear(struct dlm_ls *ls)
@@ -185,14 +185,14 @@ void dlm_dir_clear(struct dlm_ls *ls)
DLM_ASSERT(list_empty(ls-ls_recover_list), );
 
for (i = 0; i  ls-ls_dirtbl_size; i++) {
-   write_lock(ls-ls_dirtbl[i].lock);
+   spin_lock(ls-ls_dirtbl[i].lock);
head = ls-ls_dirtbl[i].list;
while (!list_empty(head)) {
de = list_entry(head-next, struct dlm_direntry, list);
list_del(de-list);
put_free_de(ls, de);
}
-   write_unlock(ls-ls_dirtbl[i].lock);
+   spin_unlock(ls-ls_dirtbl[i].lock);
}
 }
 
@@ -307,17 +307,17 @@ static int get_entry(struct dlm_ls *ls, int nodeid, char 
*name,
 
bucket = dir_hash(ls, name, namelen);
 
-   write_lock(ls-ls_dirtbl[bucket].lock);
+   spin_lock(ls-ls_dirtbl[bucket].lock);
de = search_bucket(ls, name, namelen, bucket);
if (de) {
*r_nodeid = de-master_nodeid;
-   write_unlock(ls-ls_dirtbl[bucket].lock);
+   spin_unlock(ls-ls_dirtbl[bucket].lock);
if (*r_nodeid == nodeid)
return -EEXIST;
return 0;
}
 
-   write_unlock(ls-ls_dirtbl[bucket].lock);
+   spin_unlock(ls-ls_dirtbl[bucket].lock);
 
if (namelen  DLM_RESNAME_MAXLEN)
return -EINVAL;
@@ -330,7 +330,7 @@ static int get_entry(struct dlm_ls *ls, int nodeid, char 
*name,
de-length = namelen;
memcpy(de-name, name, namelen);
 
-   write_lock(ls-ls_dirtbl[bucket].lock);
+   spin_lock(ls-ls_dirtbl[bucket].lock);
tmp = search_bucket(ls, name, namelen, bucket);
if (tmp) {
kfree(de);
@@ -339,7 +339,7 @@ static int get_entry(struct dlm_ls *ls, int nodeid, char 
*name,
list_add_tail(de-list, ls-ls_dirtbl[bucket].list);
}
*r_nodeid = de-master_nodeid;
-   write_unlock(ls-ls_dirtbl[bucket].lock);
+   spin_unlock(ls-ls_dirtbl[bucket].lock);
return 0;
 }
 
diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h
index 076e86f..d01ca0a 100644
--- a/fs/dlm/dlm_internal.h
+++ b/fs/dlm/dlm_internal.h
@@ -99,7 +99,7 @@ struct dlm_direntry {
 
 struct dlm_dirtable {
struct list_headlist;
-   rwlock_tlock;
+   spinlock_t  lock;
 };
 
 struct dlm_rsbtable {
diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c
index aa32e5f..cd8e2df 100644
--- a/fs/dlm/lockspace.c
+++ b/fs/dlm/lockspace.c
@@ -487,7 +487,7 @@ static int new_lockspace(char *name, int namelen, void 
**lockspace,
goto out_lkbfree;
for (i = 0; i  size; i++) {
INIT_LIST_HEAD(ls-ls_dirtbl[i].list);
-   rwlock_init(ls-ls_dirtbl[i].lock);
+   spin_lock_init(ls-ls_dirtbl[i].lock);
}
 
INIT_LIST_HEAD(ls-ls_waiters);
-- 
1.5.5.6

[Cluster-devel] [PATCH 6/7] dlm: ignore cancel on granted lock

2009-03-24 Thread David Teigland

Return immediately from dlm_unlock(CANCEL) if the lock is
granted and not being converted; there's nothing to cancel.

Signed-off-by: David Teigland teigl...@redhat.com
---
 fs/dlm/lock.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 8cb9204..205ec95 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -2186,6 +2186,13 @@ static int validate_unlock_args(struct dlm_lkb *lkb, 
struct dlm_args *args)
goto out;
}
 
+   /* there's nothing to cancel */
+   if (lkb-lkb_status == DLM_LKSTS_GRANTED 
+   !lkb-lkb_wait_type) {
+   rv = -EBUSY;
+   goto out;
+   }
+
switch (lkb-lkb_wait_type) {
case DLM_MSG_LOOKUP:
case DLM_MSG_REQUEST:
-- 
1.5.5.6

Re: [Cluster-devel] cman init rework

2009-03-30 Thread David Teigland

On Thu, Mar 26, 2009 at 02:50:31PM +0100, Fabio M. Di Nitto wrote:
 In our current startup sequence, we do start a daemon, we make sure it
 starts, but we never check if it's actually working properly.

If there's no groupd_compat setting in cluster.conf, or if it's set to 2, then
groupd does compat detection when it starts up, looking for old cluster2
nodes that require compat mode.  This detection phase can sometimes take a
while.  Other daemons have to ask groupd about the mode it chose after the
detection phase, and retry for a while if it's still pending.  It might be
nice for the init script to wait for this detection phase to complete after
starting groupd.  To do this we can run 'group_tool compat' and loop until
pending doesn't show up in a grep.  We should probably loop for somewhere
around 10 seconds, there's no good predictable number.  If groupd is still
pending after that time, the init script should just continue since it's most
likely taking longer than expected.  Other daemons are already prepared to
wait for groupd to pick a mode during their startup.

Re: [Cluster-devel] cman init rework

2009-03-31 Thread David Teigland

On Tue, Mar 31, 2009 at 07:23:15AM +0200, Fabio M. Di Nitto wrote:
 On Mon, 2009-03-30 at 16:42 -0500, David Teigland wrote:
  On Thu, Mar 26, 2009 at 02:50:31PM +0100, Fabio M. Di Nitto wrote:
   In our current startup sequence, we do start a daemon, we make sure it
   starts, but we never check if it's actually working properly.
  
  If there's no groupd_compat setting in cluster.conf, or if it's set to 2, 
  then
  groupd does compat detection when it starts up, looking for old cluster2
  nodes that require compat mode.  This detection phase can sometimes take a
  while.  Other daemons have to ask groupd about the mode it chose after the
  detection phase, and retry for a while if it's still pending.  It might be
  nice for the init script to wait for this detection phase to complete after
  starting groupd.  To do this we can run 'group_tool compat' and loop until
  pending doesn't show up in a grep.  We should probably loop for somewhere
  around 10 seconds, there's no good predictable number.  If groupd is still
  pending after that time, the init script should just continue since it's 
  most
  likely taking longer than expected.  Other daemons are already prepared to
  wait for groupd to pick a mode during their startup.
 
 So far we specifically check for groupd_compat=0 to avoid starting
 groupd at all.
 
 Is this still correct?
 
 For other values of groupd_compat or none specified in the config, we
 start groupd.
 
 Should we wait no matter what or only when none or 2 are specified?

Only when none or 2 are specified, there's no detection when set to 0 or 1.

Re: [Cluster-devel] [PATCH] dlm_tool: Fix silly error:

2009-05-07 Thread David Teigland

On Thu, May 07, 2009 at 03:29:03PM +0100, Chrissie Caulfield wrote:
 If you do
 
# dlm_tool lockdump default
 
 on a lockspace with no locks in it, you get the error message:
 
can't read /sys/kernel/debug/dlm/default_locks: Success
 
 This patch puts a slightly more sensible error message in place.

Ack, thanks, could you also do the same for lockdebug?

[Cluster-devel] [PATCH 0/4] dlm patches for 2.6.31

2009-06-11 Thread David Teigland

Hi,

These are the pending dlm patches for the 2.6.31 merge.  They have all been in
linux-next for quite a while, and are all minor changes/fixes.
Dave


 fs/dlm/dir.c  |7 ---
 fs/dlm/lockspace.c|   17 -
 fs/dlm/lowcomms.c |   22 ++
 fs/dlm/lowcomms.h |3 ++-
 fs/dlm/member.c   |   19 +--
 fs/dlm/requestqueue.c |2 +-
 include/linux/dlm.h   |4 ++--
 7 files changed, 48 insertions(+), 26 deletions(-)


Christine Caulfield (1):
  dlm: connect to nodes earlier

David Teigland (2):
  dlm: fix use count with multiple joins
  dlm: use more NOFS allocation

Geert Uytterhoeven (1):
  dlm: Make name input parameter of {,dlm_}new_lockspace() const

[Cluster-devel] [PATCH 2/4] dlm: fix use count with multiple joins

2009-06-11 Thread David Teigland

When a lockspace was joined multiple times, the global dlm
use count was incremented when it should not have been.  This
caused the global dlm threads to not be stopped when all
lockspaces were eventually be removed.

Signed-off-by: David Teigland teigl...@redhat.com
---
 fs/dlm/lockspace.c |   13 ++---
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c
index 82528d9..d489fcc 100644
--- a/fs/dlm/lockspace.c
+++ b/fs/dlm/lockspace.c
@@ -419,16 +419,14 @@ static int new_lockspace(const char *name, int namelen, 
void **lockspace,
break;
}
ls-ls_create_count++;
-   module_put(THIS_MODULE);
-   error = 1; /* not an error, return 0 */
+   *lockspace = ls;
+   error = 1;
break;
}
spin_unlock(lslist_lock);
 
-   if (error  0)
-   goto out;
if (error)
-   goto ret_zero;
+   goto out;
 
error = -ENOMEM;
 
@@ -583,7 +581,6 @@ static int new_lockspace(const char *name, int namelen, 
void **lockspace,
dlm_create_debug_file(ls);
 
log_debug(ls, join complete);
- ret_zero:
*lockspace = ls;
return 0;
 
@@ -628,7 +625,9 @@ int dlm_new_lockspace(const char *name, int namelen, void 
**lockspace,
error = new_lockspace(name, namelen, lockspace, flags, lvblen);
if (!error)
ls_count++;
-   else if (!ls_count)
+   if (error  0)
+   error = 0;
+   if (!ls_count)
threads_stop();
  out:
mutex_unlock(ls_lock);
-- 
1.5.5.6

[Cluster-devel] [PATCH 4/4] dlm: use more NOFS allocation

2009-06-11 Thread David Teigland

Change some GFP_KERNEL allocations to use either GFP_NOFS or
ls_allocation (when available) which the fs sets to GFP_NOFS.
The point is to prevent allocations from going back into the
cluster fs in places where that might lead to deadlock.

Signed-off-by: David Teigland teigl...@redhat.com
---
 fs/dlm/dir.c  |7 ---
 fs/dlm/lowcomms.c |6 +++---
 fs/dlm/member.c   |8 
 fs/dlm/requestqueue.c |2 +-
 4 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/dlm/dir.c b/fs/dlm/dir.c
index 858fba1..c4dfa1d 100644
--- a/fs/dlm/dir.c
+++ b/fs/dlm/dir.c
@@ -49,7 +49,8 @@ static struct dlm_direntry *get_free_de(struct dlm_ls *ls, 
int len)
spin_unlock(ls-ls_recover_list_lock);
 
if (!found)
-   de = kzalloc(sizeof(struct dlm_direntry) + len, GFP_KERNEL);
+   de = kzalloc(sizeof(struct dlm_direntry) + len,
+ls-ls_allocation);
return de;
 }
 
@@ -211,7 +212,7 @@ int dlm_recover_directory(struct dlm_ls *ls)
 
dlm_dir_clear(ls);
 
-   last_name = kmalloc(DLM_RESNAME_MAXLEN, GFP_KERNEL);
+   last_name = kmalloc(DLM_RESNAME_MAXLEN, ls-ls_allocation);
if (!last_name)
goto out;
 
@@ -322,7 +323,7 @@ static int get_entry(struct dlm_ls *ls, int nodeid, char 
*name,
if (namelen  DLM_RESNAME_MAXLEN)
return -EINVAL;
 
-   de = kzalloc(sizeof(struct dlm_direntry) + namelen, GFP_KERNEL);
+   de = kzalloc(sizeof(struct dlm_direntry) + namelen, ls-ls_allocation);
if (!de)
return -ENOMEM;
 
diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 2559a97..cdb580a 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -500,7 +500,7 @@ static void process_sctp_notification(struct connection 
*con,
return;
}
 
-   new_con = nodeid2con(nodeid, GFP_KERNEL);
+   new_con = nodeid2con(nodeid, GFP_NOFS);
if (!new_con)
return;
 
@@ -736,7 +736,7 @@ static int tcp_accept_from_sock(struct connection *con)
 *  the same time and the connections cross on the wire.
 *  In this case we store the incoming one in othercon
 */
-   newcon = nodeid2con(nodeid, GFP_KERNEL);
+   newcon = nodeid2con(nodeid, GFP_NOFS);
if (!newcon) {
result = -ENOMEM;
goto accept_err;
@@ -746,7 +746,7 @@ static int tcp_accept_from_sock(struct connection *con)
struct connection *othercon = newcon-othercon;
 
if (!othercon) {
-   othercon = kmem_cache_zalloc(con_cache, GFP_KERNEL);
+   othercon = kmem_cache_zalloc(con_cache, GFP_NOFS);
if (!othercon) {
log_print(failed to allocate incoming socket);
mutex_unlock(newcon-sock_mutex);
diff --git a/fs/dlm/member.c b/fs/dlm/member.c
index 2afb770..b128775 100644
--- a/fs/dlm/member.c
+++ b/fs/dlm/member.c
@@ -48,7 +48,7 @@ static int dlm_add_member(struct dlm_ls *ls, int nodeid)
struct dlm_member *memb;
int w, error;
 
-   memb = kzalloc(sizeof(struct dlm_member), GFP_KERNEL);
+   memb = kzalloc(sizeof(struct dlm_member), ls-ls_allocation);
if (!memb)
return -ENOMEM;
 
@@ -143,7 +143,7 @@ static void make_member_array(struct dlm_ls *ls)
 
ls-ls_total_weight = total;
 
-   array = kmalloc(sizeof(int) * total, GFP_KERNEL);
+   array = kmalloc(sizeof(int) * total, ls-ls_allocation);
if (!array)
return;
 
@@ -226,7 +226,7 @@ int dlm_recover_members(struct dlm_ls *ls, struct 
dlm_recover *rv, int *neg_out)
continue;
log_debug(ls, new nodeid %d is a re-added member, rv-new[i]);
 
-   memb = kzalloc(sizeof(struct dlm_member), GFP_KERNEL);
+   memb = kzalloc(sizeof(struct dlm_member), ls-ls_allocation);
if (!memb)
return -ENOMEM;
memb-nodeid = rv-new[i];
@@ -341,7 +341,7 @@ int dlm_ls_start(struct dlm_ls *ls)
int *ids = NULL, *new = NULL;
int error, ids_count = 0, new_count = 0;
 
-   rv = kzalloc(sizeof(struct dlm_recover), GFP_KERNEL);
+   rv = kzalloc(sizeof(struct dlm_recover), ls-ls_allocation);
if (!rv)
return -ENOMEM;
 
diff --git a/fs/dlm/requestqueue.c b/fs/dlm/requestqueue.c
index daa4183..7a2307c 100644
--- a/fs/dlm/requestqueue.c
+++ b/fs/dlm/requestqueue.c
@@ -35,7 +35,7 @@ void dlm_add_requestqueue(struct dlm_ls *ls, int nodeid, 
struct dlm_message *ms)
struct rq_entry *e;
int length = ms-m_header.h_length - sizeof(struct dlm_message);
 
-   e = kmalloc(sizeof(struct rq_entry) + length, GFP_KERNEL);
+   e = kmalloc(sizeof(struct rq_entry) + length, ls

[Cluster-devel] [PATCH 1/4] dlm: Make name input parameter of {, dlm_}new_lockspace() const

2009-06-11 Thread David Teigland

From: Geert Uytterhoeven ge...@linux-m68k.org

| fs/gfs2/lock_dlm.c:207: warning: passing argument 1 of 'dlm_new_lockspace' 
discards qualifiers from pointer target type

Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org
Signed-off-by: David Teigland teigl...@redhat.com
---
 fs/dlm/lockspace.c  |4 ++--
 include/linux/dlm.h |4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c
index cd8e2df..82528d9 100644
--- a/fs/dlm/lockspace.c
+++ b/fs/dlm/lockspace.c
@@ -384,7 +384,7 @@ static void threads_stop(void)
dlm_astd_stop();
 }
 
-static int new_lockspace(char *name, int namelen, void **lockspace,
+static int new_lockspace(const char *name, int namelen, void **lockspace,
 uint32_t flags, int lvblen)
 {
struct dlm_ls *ls;
@@ -614,7 +614,7 @@ static int new_lockspace(char *name, int namelen, void 
**lockspace,
return error;
 }
 
-int dlm_new_lockspace(char *name, int namelen, void **lockspace,
+int dlm_new_lockspace(const char *name, int namelen, void **lockspace,
  uint32_t flags, int lvblen)
 {
int error = 0;
diff --git a/include/linux/dlm.h b/include/linux/dlm.h
index b9cd386..0b3518c 100644
--- a/include/linux/dlm.h
+++ b/include/linux/dlm.h
@@ -81,8 +81,8 @@ struct dlm_lksb {
  * the cluster, the calling node joins it.
  */
 
-int dlm_new_lockspace(char *name, int namelen, dlm_lockspace_t **lockspace,
- uint32_t flags, int lvblen);
+int dlm_new_lockspace(const char *name, int namelen,
+ dlm_lockspace_t **lockspace, uint32_t flags, int lvblen);
 
 /*
  * dlm_release_lockspace
-- 
1.5.5.6

1 2 3 4 >

1 - 100 of 341 matches

Mail list logo