Re: [Pacemaker] errors in corosync.log

2010-01-19 Thread Shravan Mishra
cibadmin 1.0.5 for OpenAIS and Heartbeat (Build:
9e9faaab40f3f97e3c0d623e4a4c47ed83fa1601)

-Shravan

On Tue, Jan 19, 2010 at 8:29 AM, Andrew Beekhof and...@beekhof.net wrote:
 On Sat, Jan 16, 2010 at 9:20 PM, Shravan Mishra
 shravan.mis...@gmail.com wrote:
 Hi Guys,

 I'm running the following version of pacemaker and corosync
 corosync=1.1.1-1-2
 pacemaker=1.0.9-2-1

 That pacemaker version doesn't exist.
 What does cibadmin --version say?

 And are you sure about the corosync version, it doesn't look right either.


 Every thing had been running fine for quite some time now but then I
 started seeing following errors in the corosync logs,


 =
 Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
 digest... ignoring.
 Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
 Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
 digest... ignoring.
 Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
 Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
 digest... ignoring.
 Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
 

 I can perform all the crm shell commands and what not but it's
 troubling that the above is happening.

 My crm_mon output looks good.


 I also checked the authkey and did md5sum on both it's same.

 Then I stopped corosync and regenerated the authkey with
 corosync-keygen and copied it to the the other machine but I still get
 the above message in the corosync log.

 Is there anything other authkey that I should look into ?


 corosync.conf

 

 # Please read the corosync.conf.5 manual page
 compatibility: whitetank

 totem {
        version: 2
        token: 3000
        token_retransmits_before_loss_const: 10
        join: 60
        consensus: 1500
        vsftype: none
        max_messages: 20
        clear_node_high_bit: yes
        secauth: on
        threads: 0
        rrp_mode: passive

        interface {
                ringnumber: 0
                bindnetaddr: 192.168.2.0
                #mcastaddr: 226.94.1.1
                broadcast: yes
                mcastport: 5405
        }
        interface {
                ringnumber: 1
                bindnetaddr: 172.20.20.0
                #mcastaddr: 226.94.1.1
                broadcast: yes
                mcastport: 5405
        }
 }


 logging {
        fileline: off
        to_stderr: yes
        to_logfile: yes
        to_syslog: yes
        logfile: /tmp/corosync.log
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
        }
 }

 service {
        name: pacemaker
        ver: 0
 }

 aisexec {
        user:root
        group: root
 }

 amf {
        mode: disabled
 }


 ===


 Thanks
 Shravan

 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker


 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] errors in corosync.log

2010-01-19 Thread Shravan Mishra
Corosync Cluster Engine, version '1.1.1' SVN revision '2534'
Copyright (c) 2006-2009 Red Hat, Inc.

Shravan


On Tue, Jan 19, 2010 at 10:59 AM, Shravan Mishra
shravan.mis...@gmail.com wrote:
 cibadmin 1.0.5 for OpenAIS and Heartbeat (Build:
 9e9faaab40f3f97e3c0d623e4a4c47ed83fa1601)

 -Shravan

 On Tue, Jan 19, 2010 at 8:29 AM, Andrew Beekhof and...@beekhof.net wrote:
 On Sat, Jan 16, 2010 at 9:20 PM, Shravan Mishra
 shravan.mis...@gmail.com wrote:
 Hi Guys,

 I'm running the following version of pacemaker and corosync
 corosync=1.1.1-1-2
 pacemaker=1.0.9-2-1

 That pacemaker version doesn't exist.
 What does cibadmin --version say?

 And are you sure about the corosync version, it doesn't look right either.


 Every thing had been running fine for quite some time now but then I
 started seeing following errors in the corosync logs,


 =
 Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
 digest... ignoring.
 Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
 Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
 digest... ignoring.
 Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
 Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
 digest... ignoring.
 Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
 

 I can perform all the crm shell commands and what not but it's
 troubling that the above is happening.

 My crm_mon output looks good.


 I also checked the authkey and did md5sum on both it's same.

 Then I stopped corosync and regenerated the authkey with
 corosync-keygen and copied it to the the other machine but I still get
 the above message in the corosync log.

 Is there anything other authkey that I should look into ?


 corosync.conf

 

 # Please read the corosync.conf.5 manual page
 compatibility: whitetank

 totem {
        version: 2
        token: 3000
        token_retransmits_before_loss_const: 10
        join: 60
        consensus: 1500
        vsftype: none
        max_messages: 20
        clear_node_high_bit: yes
        secauth: on
        threads: 0
        rrp_mode: passive

        interface {
                ringnumber: 0
                bindnetaddr: 192.168.2.0
                #mcastaddr: 226.94.1.1
                broadcast: yes
                mcastport: 5405
        }
        interface {
                ringnumber: 1
                bindnetaddr: 172.20.20.0
                #mcastaddr: 226.94.1.1
                broadcast: yes
                mcastport: 5405
        }
 }


 logging {
        fileline: off
        to_stderr: yes
        to_logfile: yes
        to_syslog: yes
        logfile: /tmp/corosync.log
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
        }
 }

 service {
        name: pacemaker
        ver: 0
 }

 aisexec {
        user:root
        group: root
 }

 amf {
        mode: disabled
 }


 ===


 Thanks
 Shravan

 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker


 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker



___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] errors in corosync.log

2010-01-18 Thread Andrew Beekhof
On Sat, Jan 16, 2010 at 9:20 PM, Shravan Mishra
shravan.mis...@gmail.com wrote:
 Hi Guys,

 I'm running the following version of pacemaker and corosync
 corosync=1.1.1-1-2
 pacemaker=1.0.9-2-1

 Every thing had been running fine for quite some time now but then I
 started seeing following errors in the corosync logs,


 =
 Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
 digest... ignoring.
 Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
 Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
 digest... ignoring.
 Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
 Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
 digest... ignoring.
 Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
 

 I can perform all the crm shell commands and what not but it's
 troubling that the above is happening.

 My crm_mon output looks good.


 I also checked the authkey and did md5sum on both it's same.

 Then I stopped corosync and regenerated the authkey with
 corosync-keygen and copied it to the the other machine but I still get
 the above message in the corosync log.

Are you sure there's not a third node somewhere broadcasting on that
mcast and port combination?


 Is there anything other authkey that I should look into ?


 corosync.conf

 

 # Please read the corosync.conf.5 manual page
 compatibility: whitetank

 totem {
        version: 2
        token: 3000
        token_retransmits_before_loss_const: 10
        join: 60
        consensus: 1500
        vsftype: none
        max_messages: 20
        clear_node_high_bit: yes
        secauth: on
        threads: 0
        rrp_mode: passive

        interface {
                ringnumber: 0
                bindnetaddr: 192.168.2.0
                #mcastaddr: 226.94.1.1
                broadcast: yes
                mcastport: 5405
        }
        interface {
                ringnumber: 1
                bindnetaddr: 172.20.20.0
                #mcastaddr: 226.94.1.1
                broadcast: yes
                mcastport: 5405
        }
 }


 logging {
        fileline: off
        to_stderr: yes
        to_logfile: yes
        to_syslog: yes
        logfile: /tmp/corosync.log
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
        }
 }

 service {
        name: pacemaker
        ver: 0
 }

 aisexec {
        user:root
        group: root
 }

 amf {
        mode: disabled
 }


 ===


 Thanks
 Shravan

 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] errors in corosync.log

2010-01-18 Thread Shravan Mishra
Hi ,

Since the interfaces on the two nodes are connected via cross over
cable so there is no chance of that happening and since I'm using rrp:
passive, which means that the other ring i.e. ring 1 will come into
play only when ring 0 fails,I assume.  I say this because ring 1
interface is on the network.


Once interesting that I observed was that
 lintomcrypt is being used for crypto reasons because I have secauth: on.

But I couldn't find that library on my machine.

I'm wondering if it's because of that.

Basically we are using 3 interfaces eth0, eth1 and eth2.

eth0 and eth2 are for ring 0 and ring 1 respectively. eth1 is the
primary interface.

This is what my drbd.conf looks like:


==
# please have a a look at the example configuration file in
# /usr/share/doc/drbd82/drbd.conf
#
global {
usage-count no;
}
common {
protocol C;
  startup {
wfc-timeout 120;
degr-wfc-timeout 120;
  }
}
resource var_nsm {
syncer {
rate 333M;
}
handlers {
fence-peer /usr/lib/drbd/crm-fence-peer.sh;
after-resync-target /usr/lib/drbd/crm-unfence-peer.sh;
}
net {
after-sb-1pri discard-secondary;
}
on node1.itactics.com {
device /dev/drbd1;
 disk /dev/sdb3;
 address 172.20.20.1:7791;
 meta-disk internal;
  }
on node2.itactics.com {
device /dev/drbd1;
 disk /dev/sdb3;
 address 172.20.20.2:7791;
 meta-disk internal;
}
}
=


eth0's of the two nodes are connected via cross over as I mentioned
and eth1 and eth2 are on the network.

I'm not a networking expert but is it possible that broadcast done by
,let's say, any node not in my cluster, will still cause it to come to
my nodes through other interfaces which are attached to the network?


We in the dev and the QA guys are testing this in parallel.

And let's say there is QA cluster of two nodes and dev cluster of 2 nodes.

And interfaces for both of them are hooked as I mentioned above and that
corosync.conf for both the clusters have  bindnetaddr: 192.168.2.0.

Is there possibility of bad messages for the cluster casused by the other.


We are in the final leg of the testing and this came up.

Thanks for the help.


Shravan






On Mon, Jan 18, 2010 at 2:58 AM, Andrew Beekhof and...@beekhof.net wrote:
 On Sat, Jan 16, 2010 at 9:20 PM, Shravan Mishra
 shravan.mis...@gmail.com wrote:
 Hi Guys,

 I'm running the following version of pacemaker and corosync
 corosync=1.1.1-1-2
 pacemaker=1.0.9-2-1

 Every thing had been running fine for quite some time now but then I
 started seeing following errors in the corosync logs,


 =
 Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
 digest... ignoring.
 Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
 Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
 digest... ignoring.
 Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
 Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
 digest... ignoring.
 Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
 

 I can perform all the crm shell commands and what not but it's
 troubling that the above is happening.

 My crm_mon output looks good.


 I also checked the authkey and did md5sum on both it's same.

 Then I stopped corosync and regenerated the authkey with
 corosync-keygen and copied it to the the other machine but I still get
 the above message in the corosync log.

 Are you sure there's not a third node somewhere broadcasting on that
 mcast and port combination?


 Is there anything other authkey that I should look into ?


 corosync.conf

 

 # Please read the corosync.conf.5 manual page
 compatibility: whitetank

 totem {
        version: 2
        token: 3000
        token_retransmits_before_loss_const: 10
        join: 60
        consensus: 1500
        vsftype: none
        max_messages: 20
        clear_node_high_bit: yes
        secauth: on
        threads: 0
        rrp_mode: passive

        interface {
                ringnumber: 0
                bindnetaddr: 192.168.2.0
                #mcastaddr: 226.94.1.1
                broadcast: yes
                mcastport: 5405
        }
        interface {
                ringnumber: 1
                bindnetaddr: 172.20.20.0
                #mcastaddr: 226.94.1.1
                broadcast: yes
                mcastport: 5405
        }
 }


 logging {
        fileline: off
        to_stderr: yes
        to_logfile: yes
        to_syslog: yes
        logfile: /tmp/corosync.log
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
        }
 }

 service {
        name: pacemaker
        ver: 0
 }

 aisexec {
        user:root
        group: 

Re: [Pacemaker] errors in corosync.log

2010-01-18 Thread Steven Dake
One possibility is you have a different cluster in your network on the
same multicast address and port.

Regards
-steve

On Sat, 2010-01-16 at 15:20 -0500, Shravan Mishra wrote:
 Hi Guys,
 
 I'm running the following version of pacemaker and corosync
 corosync=1.1.1-1-2
 pacemaker=1.0.9-2-1
 
 Every thing had been running fine for quite some time now but then I
 started seeing following errors in the corosync logs,
 
 
 =
 Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
 digest... ignoring.
 Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
 Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
 digest... ignoring.
 Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
 Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
 digest... ignoring.
 Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
 
 
 I can perform all the crm shell commands and what not but it's
 troubling that the above is happening.
 
 My crm_mon output looks good.
 
 
 I also checked the authkey and did md5sum on both it's same.
 
 Then I stopped corosync and regenerated the authkey with
 corosync-keygen and copied it to the the other machine but I still get
 the above message in the corosync log.
 
 Is there anything other authkey that I should look into ?
 
 
 corosync.conf
 
 
 
 # Please read the corosync.conf.5 manual page
 compatibility: whitetank
 
 totem {
 version: 2
 token: 3000
 token_retransmits_before_loss_const: 10
 join: 60
 consensus: 1500
 vsftype: none
 max_messages: 20
 clear_node_high_bit: yes
 secauth: on
 threads: 0
 rrp_mode: passive
 
 interface {
 ringnumber: 0
 bindnetaddr: 192.168.2.0
 #mcastaddr: 226.94.1.1
 broadcast: yes
 mcastport: 5405
 }
 interface {
 ringnumber: 1
 bindnetaddr: 172.20.20.0
 #mcastaddr: 226.94.1.1
 broadcast: yes
 mcastport: 5405
 }
 }
 
 
 logging {
 fileline: off
 to_stderr: yes
 to_logfile: yes
 to_syslog: yes
 logfile: /tmp/corosync.log
 debug: off
 timestamp: on
 logger_subsys {
 subsys: AMF
 debug: off
 }
 }
 
 service {
 name: pacemaker
 ver: 0
 }
 
 aisexec {
 user:root
 group: root
 }
 
 amf {
 mode: disabled
 }
 
 
 ===
 
 
 Thanks
 Shravan
 
 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker