Hi, Please disable syslog in openais.conf, and try it again. It seems this issue is related to fork() call and syslog().
hj On Fri, Nov 13, 2009 at 1:08 PM, Giovanni Di Milia <gdimi...@cfa.harvard.edu > wrote: > Thank you very much for your response. > > The only thing I really don't understand is: why this problem doesn't > appear in all my simulations? > I configured at least 7 couple of virtual servers with vmware 2 and CentOS > 5.3 and 5.4 (32 and 64 bits) and I never had this kind of problems! > > The only difference in the configuration is that I used private IPs for the > simulations and public IPs for the real servers, but I don't think it is > important. > > Thanks for your patience, > Giovanni > > > > On Nov 13, 2009, at 1:36 PM, hj lee wrote: > > Hi, > > I have the same problem in CentOS 5.3 with pacemaker-1.0.5 and > openais-0.80.5. This is openais bug! Two problems. > 1. Starting openais service gets seg fault sometime. It more likely happens > if openais service get started before syslog. > 2. The seg fault handler of openais calls syslog(). The syslog is one of > UNSAFE function that must not be called from signal handler because it is > non-reentrent function. > > To fix this issue: get the openais source, find sigsegv_handler function > exec/main.c and just comment out log_flush(), shown below. Then recompile > and isntall it(make and make install). The log_flush should be removed from > all signal handlers in openais code base. I am still not sure where seg > fault occurs, but commenting out log_flush prevents seg fault. > > > ------------------------------------------------------------------------- > static void sigsegv_handler (int num) > { > signal (SIGSEGV, SIG_DFL); > // log_flush (); > raise (SIGSEGV); > } > > Thanks > hj > > On Thu, Nov 12, 2009 at 4:21 PM, Giovanni Di Milia < > gdimi...@cfa.harvard.edu> wrote: > >> I set up a cluster of two servers CentOS 5.4 x86_64 with pacemaker 1.06 >> and corosync 1.1.2 >> >> I only installed the x86_64 packages (yum install pacemaker try to install >> also the 32 bits one). >> >> I configured a shared cluster IP (it's a public ip) and a cluster website. >> >> Everything work fine if i try to stop corosync on one of the two servers >> (the services pass from one machine to the other without problems), but if I >> reboot one server, when it returns alive it cannot go online in the cluster. >> I also noticed that there are several thread of corosync and if I kill all >> of them and then I start again corosync, everything work fine again. >> >> I don't know what is happening and I'm not able to reproduce the same >> situation on some virtual servers! >> >> Thanks, >> Giovanni >> >> >> >> the configuration of corosync is the following: >> >> ############################################## >> # Please read the corosync.conf.5 manual page >> compatibility: whitetank >> >> aisexec { >> # Run as root - this is necessary to be able to manage resources >> with Pacemaker >> user: root >> group: root >> } >> >> service { >> # Load the Pacemaker Cluster Resource Manager >> ver: 0 >> name: pacemaker >> use_mgmtd: yes >> use_logd: yes >> } >> >> totem { >> version: 2 >> >> # How long before declaring a token lost (ms) >> token: 5000 >> >> # How many token retransmits before forming a new configuration >> token_retransmits_before_loss_const: 10 >> >> # How long to wait for join messages in the membership protocol >> (ms) >> join: 1000 >> >> # How long to wait for consensus to be achieved before starting a >> new round of membership configuration (ms) >> consensus: 2500 >> >> # Turn off the virtual synchrony filter >> vsftype: none >> >> # Number of messages that may be sent by one processor on receipt >> of the token >> max_messages: 20 >> >> # Stagger sending the node join messages by 1..send_join ms >> send_join: 45 >> >> # Limit generated nodeids to 31-bits (positive signed integers) >> clear_node_high_bit: yes >> >> # Disable encryption >> secauth: off >> >> # How many threads to use for encryption/decryption >> threads: 0 >> >> # Optionally assign a fixed node id (integer) >> # nodeid: 1234 >> >> interface { >> ringnumber: 0 >> >> # The following values need to be set based on your >> environment >> bindnetaddr: XXX.XXX.XXX.0 #here I put the right ip for my configuration >> mcastaddr: 226.94.1.1 >> mcastport: 4000 >> } >> } >> >> logging { >> fileline: off >> to_stderr: yes >> to_logfile: yes >> to_syslog: yes >> logfile: /tmp/corosync.log >> debug: off >> timestamp: on >> logger_subsys { >> subsys: AMF >> debug: off >> } >> } >> >> amf { >> mode: disabled >> } >> >> ################################################## >> >> >> >> _______________________________________________ >> Pacemaker mailing list >> Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > > > > -- > Dream with longterm vision! > kerdosa > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > -- Dream with longterm vision! kerdosa
_______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker