Let me perhaps add some findings at this time:
This is my config file (taken from google):
==========================
totem {
version: 2
# How long before declaring a token lost (ms)
token: 5000
# How many token retransmits before forming a new configuration
token_retransmits_before_loss_const: 20
# How long to wait for join messages in the membership protocol (ms)
join: 1000
# How long to wait for consensus to be achieved before starting a
new round of membership configuration (ms)
consensus: 7500
# Turn off the virtual synchrony filter
vsftype: none
# Number of messages that may be sent by one processor on receipt of
the token
max_messages: 20
# Disable encryption
secauth: off
# How many threads to use for encryption/decryption
threads: 0
# Limit generated nodeids to 31-bits (positive signed integers)
clear_node_high_bit: yes
# Optionally assign a fixed node id (integer)
# nodeid: 1234
interface {
ringnumber: 0
# The following three values need to be set based on your
environment
bindnetaddr: 192.168.1.0
mcastaddr: 226.94.1.1
mcastport: 5405
}
}
logging {
fileline: off
to_syslog: yes
to_stderr: no
syslog_facility: daemon
debug: on
timestamp: on
}
amf {
mode: disabled
}
==============================
The running corosync process
-tells this to syslog:
corosync[3837]: [MAIN ] Could not lock memory of service to avoid page
faults
corosync[3837]: [MAIN ] Corosync Cluster Engine ('1.2.8'): started and
ready to provide service.
corosync[3837]: [MAIN ] Corosync built-in features: nss
corosync[3837]: [MAIN ] Successfully read main configuration file
'/opt/ha/etc/corosync/corosync.conf'.
corosync[3837]: [TOTEM ] Initializing transport (UDP/IP).
corosync[3837]: [TOTEM ] Initializing transmit/receive security:
libtomcrypt SOBER128/SHA1HMAC (mode 0).
corosync[3837]: [TOTEM ] The network interface [192.168.1.1] is now up.
corosync[3837]: [SERV ] Service engine loaded: corosync extended virtual
synchrony service
corosync[3837]: [SERV ] Service engine loaded: corosync configuration
service
corosync[3837]: [SERV ] Service engine loaded: corosync cluster closed
process group service v1.01
corosync[3837]: [SERV ] Service engine loaded: corosync cluster config
database access v1.01
corosync[3837]: [SERV ] Service engine loaded: corosync profile loading
service
corosync[3837]: [SERV ] Service engine loaded: corosync cluster quorum
service v0.1
-sends 2 packets to the network on startup:
13:57:07.125097 IP 192.168.1.1 > 226.94.1.1: igmp v2 report 226.94.1.1
13:57:15.708138 IP 192.168.1.1 > 226.94.1.1: igmp v2 report 226.94.1.1
-has the following sockets:
# sockstat | grep corosync
root corosync 6563 3 dgram - /var/run/log
root corosync 6563 7 stream /opt/ha/var/run/corosync.ipc -
root corosync 6563 8 udp 226.94.1.1.netsupport *.*
root corosync 6563 9 udp 192.168.1.1.hpoms-dps-lstn *.*
root corosync 6563 10 udp 192.168.1.1.netsupport *.*
-has 4 LWPs
# ps axsp 6563
UID PID PPID CPU LID NLWP PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 6563 1 0 4 4 191 0 23708 2708 parked I- ? 2:59.94
./corosync
0 6563 1 0 3 4 191 0 23708 2708 select I- ? 2:59.94
./corosync
0 6563 1 0 2 4 191 0 23708 2708 parked I- ? 2:59.94
./corosync
0 6563 1 0 1 4 191 0 23708 2708 - R ? 2:59.94
./corosync
-does not answer to e.g. corosync-cfgtool
I can not use gdb to analyse threads from a running process (maybe this is a
NetBSD issue). Though I could kill it with SIGABRT and analyse the core file
then:
=======================================
(gdb) info thr
4 process 69373 0xbbbae202 in pthread_rwlock_unlock () from
/usr/lib/libpthread.so.0
3 process 134909 0xbbb14697 in _lwp_exit () from /usr/lib/libc.so.12
2 process 200445 0xbbadab67 in poll () from /usr/lib/libc.so.12
* 1 process 265981 0xbbadb5a7 in _lwp_park () from /usr/lib/libc.so.12
(gdb) bt
#0 0xbbadb5a7 in _lwp_park () from /usr/lib/libc.so.12
#1 0xbbbafb14 in pthread_cond_wait () from /usr/lib/libpthread.so.0
#2 0xbbbac8b1 in sem_wait () from /usr/lib/libpthread.so.0
#3 0x0804dbcf in corosync_exit_thread_handler (arg=0x0) at main.c:198
#4 0xbbbb19df in pthread_create () from /usr/lib/libpthread.so.0
#5 0xbbafd670 in swapcontext () from /usr/lib/libc.so.12
(gdb) thr 2
[Switching to thread 2 (process 200445)]#0 0xbbadab67 in poll () from
/usr/lib/libc.so.12
(gdb) bt
#0 0xbbadab67 in poll () from /usr/lib/libc.so.12
#1 0xbbbad0f9 in poll () from /usr/lib/libpthread.so.0
#2 0x08050269 in prioritized_timer_thread (data=0x0) at timer.c:127
#3 0xbbbb19df in pthread_create () from /usr/lib/libpthread.so.0
#4 0xbbafd670 in swapcontext () from /usr/lib/libc.so.12
(gdb) thr 3
[Switching to thread 3 (process 134909)]#0 0xbbb14697 in _lwp_exit () from
/usr/lib/libc.so.12
(gdb) bt
#0 0xbbb14697 in _lwp_exit () from /usr/lib/libc.so.12
#1 0xbbbb11f6 in pthread_exit () from /usr/lib/libpthread.so.0
#2 0xbbbc0f75 in logsys_worker_thread (data=0x0) at logsys.c:733
#3 0xbbbb19df in pthread_create () from /usr/lib/libpthread.so.0
#4 0xbbafd670 in swapcontext () from /usr/lib/libc.so.12
(gdb) thr 4
[Switching to thread 4 (process 69373)]#0 0xbbbae202 in
pthread_rwlock_unlock () from /usr/lib/libpthread.so.0
(gdb) bt
#0 0xbbbae202 in pthread_rwlock_unlock () from /usr/lib/libpthread.so.0
#1 0x00000000 in ?? ()
=======================================
Are these information of any use?
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais