On Sat, 2009-08-08 at 08:07 -0700, Steven Dake wrote:
> Can you try with wthread:2 in the totem config section?  If that works
> multinode cluster with cpgverify on multiple nodes, good for merge.

This is a new patch that forks corosync after all the configuration is
done and right before totem is initialized. It passes the above test
case just fine. Tested on a 6 nodes cluster with 2 totem threads.

the previous patch hits the same bug that's fixed by
logsys_forked_completed where threads need to be created after fork().

Fabio

> 
> Regards
> -steve
> 
> On Sat, 2009-08-08 at 06:27 +0200, Fabio M. Di Nitto wrote:
> > On Fri, 2009-08-07 at 10:16 -0700, Steven Dake wrote:
> > > This is a pretty big change...  Forking after all the init processes
> > > complete may not work. 
> > >  Are you certain this introduces no hard to
> > > detect regression?  For example, totemudp creates threads when wthread
> > > is set...
> > 
> > For all the tests I have done, I have seen no regression. This patch is
> > not urgently required in flatiron, so we can make it boil in trunk for a
> > longer period of time for people to test.
> > 
> > > 
> > > regards
> > > -steve
> > > 
> > > On Fri, 2009-08-07 at 13:20 +0200, Fabio M. Di Nitto wrote:
> > > > Detach tty as late as possible to give a chance to corosync startup
> > > > wrappers (cman and possibly others) to collect as much output
> > > > as possible from stderr in case of errors.
> > > > 
> > > > Signed-off-by: Fabio M. Di Nitto <[email protected]>
> > > > ---
> > > > :100644 100644 edaa69f... 3aa52df... M  exec/main.c
> > > >  exec/main.c |   12 ++++++++----
> > > >  1 files changed, 8 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/exec/main.c b/exec/main.c
> > > > index edaa69f..3aa52df 100644
> > > > --- a/exec/main.c
> > > > +++ b/exec/main.c
> > > > @@ -770,9 +770,6 @@ int main (int argc, char **argv)
> > > >                 }
> > > >         }
> > > >  
> > > > -       if (background)
> > > > -               corosync_tty_detach ();
> > > > -
> > > >         /*
> > > >          * Set round robin realtime scheduling with priority 99
> > > >          * Lock all memory to avoid page faults which may interrupt
> > > > @@ -886,7 +883,6 @@ int main (int argc, char **argv)
> > > >                 syslog (LOGSYS_LEVEL_ERROR, "%s", error_string);
> > > >                 corosync_exit_error (AIS_DONE_MAINCONFIGREAD);
> > > >         }
> > > > -       logsys_fork_completed();
> > > >  
> > > >         /*
> > > >          * Make sure required directory is present
> > > > @@ -1035,6 +1031,14 @@ int main (int argc, char **argv)
> > > >         coroipcs_ipc_init (&ipc_init_state);
> > > >  
> > > >         /*
> > > > +        * Now we are fully initialized
> > > > +        */
> > > > +       if (background) {
> > > > +               corosync_tty_detach ();
> > > > +       }
> > > > +       logsys_fork_completed();
> > > > +
> > > > +       /*
> > > >          * Start main processing loop
> > > >          */
> > > >         poll_run (corosync_poll_handle);
> > > 
> > 
> 
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais
diff --git a/exec/main.c b/exec/main.c
index edaa69f..a6a3fae 100644
--- a/exec/main.c
+++ b/exec/main.c
@@ -770,9 +770,6 @@ int main (int argc, char **argv)
 		}
 	}
 
-	if (background)
-		corosync_tty_detach ();
-
 	/*
 	 * Set round robin realtime scheduling with priority 99
 	 * Lock all memory to avoid page faults which may interrupt
@@ -886,7 +883,6 @@ int main (int argc, char **argv)
 		syslog (LOGSYS_LEVEL_ERROR, "%s", error_string);
 		corosync_exit_error (AIS_DONE_MAINCONFIGREAD);
 	}
-	logsys_fork_completed();
 
 	/*
 	 * Make sure required directory is present
@@ -950,6 +946,14 @@ int main (int argc, char **argv)
 	}
 
 	/*
+	 * Now we are fully initialized
+	 */
+	if (background) {
+		corosync_tty_detach ();
+	}
+	logsys_fork_completed();
+
+	/*
 	 * Sleep for a while to let other nodes in the cluster
 	 * understand that this node has been away (if it was
 	 * an corosync restart).
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to