[Sequoia] error joining group after restart

BESSON-DEBLON, Pierre (SOGETI HIGH TECH) Tue, 12 Feb 2008 07:26:29 -0800

Hi,

A problem occurs on restarting a controller (architecture with 2 controllers 
with jgroup)


The restarted one logged those lines :
12 Feb 2008 09:31:16,717 | LEVEL1 | Using Hedera properties file: 
/hedera_jgroups.properties
12 Feb 2008 09:31:17,459 | LEVEL3 | Group communication channel is configured 
as follows: JGroups channel wrapper: [EMAIL PROTECTED]
12 Feb 2008 09:31:20,065 | LEVEL3 | Storing checkpoint 
Member(address=/172.20.1.77:65201, uid=db) joined group 
db-172.20.1.81:28346-20080212093120061+0000 at request id 0
12 Feb 2008 09:31:20,111 | LEVEL3 | Storing checkpoint 
Member(address=/172.20.1.77:65201, uid=db) quit group 
db-172.20.1.81:28346-20080212093120108+0000 at request id 0
12 Feb 2008 09:31:20,117 | LEVEL3 | Removing controller null
12 Feb 2008 09:31:20,117 | LEVEL3 | Refreshing members 
list:[Member(address=/172.20.1.81:65201, uid=db)]
12 Feb 2008 09:31:20,118 | LEVEL1 | 0 requests were waiting responses from 
Member(address=/172.20.1.77:65201, uid=db)
12 Feb 2008 09:31:22,073 | LEVEL1 | Group db connected to 
Member(address=/172.20.1.81:65201, uid=db)
12 Feb 2008 09:31:22,080 | LEVEL1 | First controller in group db

The alive one logged those lines
12 Feb 2008 09:30:56,413 | LEVEL3 | Storing checkpoint 
Member(address=/172.20.1.81:65201, uid=db) quit group 
db-172.20.1.77:28346-20080212093056405+0000 at request id 0
12 Feb 2008 09:30:56,416 | LEVEL1 | Member(address=/172.20.1.81:65201, uid=db) 
has left distributed virtual database db
12 Feb 2008 09:30:56,416 | LEVEL1 | Controller 
Member(address=/172.20.1.81:65201, uid=db) has left the cluster.
12 Feb 2008 09:30:56,416 | LEVEL1 | 0 requests were waiting responses from 
Member(address=/172.20.1.81:65201, uid=db)
12 Feb 2008 09:30:56,455 | LEVEL3 | handleMessageSingleThreaded (class 
org.continuent.sequoia.controller.virtualdatabase.protocol.FlushGroupCommunicationMessages):
 [EMAIL PROTECTED]
12 Feb 2008 09:30:56,456 | LEVEL3 | handleMessageMultiThreaded (class 
org.continuent.sequoia.controller.virtualdatabase.protocol.FlushGroupCommunicationMessages):
 [EMAIL PROTECTED]
12 Feb 2008 09:30:56,456 | LEVEL3 | Removed [EMAIL PROTECTED] from total order 
queue
12 Feb 2008 09:30:56,500 | LEVEL1 | Waiting 120000ms for client of controller 
562949953421312 to failover
12 Feb 2008 09:31:19,765 | LEVEL3 | Storing checkpoint 
Member(address=/172.20.1.81:65201, uid=db) joined group 
db-172.20.1.77:28346-20080212093119764+0000 at request id 0
12 Feb 2008 09:31:58,027.....

How could it happen !?

Is there anything with my jgroups configuration ?
<config>
    <UDP mcast_port="%mcast_port%" 
         mcast_addr="228.8.8.9"
         tos="16"
         ucast_recv_buf_size="20000000"
         ucast_send_buf_size="640000"
         mcast_recv_buf_size="25000000" 
         mcast_send_buf_size="640000" 
         loopback="false"
         discard_incompatible_packets="true"
         max_bundle_size="64000"
         max_bundle_timeout="30"
         use_incoming_packet_handler="true" 
         use_outgoing_packet_handler="false" 
         ip_ttl="12" 
         bind_addr="%bind_addr%"
         bind_port="%bind_port%" 
         down_thread="false" up_thread="false"
         enable_bundling="true"
         diagnostics_port="%diagnostics_port%"/>
    <PING timeout="2000"
          down_thread="false" up_thread="false" num_initial_members="3"/>
    <MERGE2 max_interval="10000"
            down_thread="false" up_thread="false" min_interval="5000"/>
    <FD timeout="2500" max_tries="3" shun="false"/>
    <FD_SOCK down_thread="false" up_thread="false" start_port="%fd_sock_port%"/>
    <!--FD_ALL intervall="3000" timeout="10000"/-->

    <!--VERIFY_SUSPECT timeout="1500" down_thread="false"/-->
    <pbcast.NAKACK max_xmit_size="60000"
                   use_mcast_xmit="false" gc_lag="0"
                   retransmit_timeout="100,200,300,600,1200,2400,4800"
                   down_thread="false" up_thread="false"
                   discard_delivered_msgs="true"/>
    <UNICAST timeout="300,600,1200,2400,3600"
             down_thread="false" up_thread="false"/>
    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" 
                   down_thread="false" up_thread="false"
                   max_bytes="400000"/>
    <VIEW_SYNC avg_send_interval="60000" down_thread="false" up_thread="false" 
/>
    <pbcast.GMS print_local_addr="true" join_timeout="3000" 
                down_thread="false" up_thread="false"
                join_retry_timeout="2000" shun="true" 
handle_concurrent_startup="true" />
    <SEQUENCER down_thread="false" up_thread="false" />
    <FC max_credits="2000000" down_thread="false" up_thread="false"
           min_threshold="0.10"/>
    <!-- FRAG2 frag_size="60000" down_thread="false" up_thread="true"/ -->
    <!-- pbcast.STATE_TRANSFER down_thread="false" up_thread="false"/-->
</config>

Any other clue ?

Thanks in advance


Pierre




The information in this e-mail is confidential. The contents may not be 
disclosed or used by anyone other then the addressee. Access to this e-mail by 
anyone else is unauthorised.
If you are not the intended recipient, please notify Airbus immediately and 
delete this e-mail.
Airbus cannot accept any responsibility for the accuracy or completeness of 
this e-mail as it has been sent over public networks. If you have any concerns 
over the content of this message or its Accuracy or Integrity, please contact 
Airbus immediately.
All outgoing e-mails from Airbus are checked using regularly updated virus 
scanning software but you should take whatever measures you deem to be 
appropriate to ensure that this message and any attachments are virus free.


_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

[Sequoia] error joining group after restart

Reply via email to