On Wed, 2010-02-17 at 16:40 -0700, hj lee wrote:
> 
> On Fri, Feb 12, 2010 at 11:17 PM, Steven Dake <[email protected]>
> wrote: 
>         
>         On Fri, 2010-02-12 at 12:51 -0700, hj lee wrote:
>         > Hi,
>         >
>         > If there are only two nodes in cluster and their IP
>         addresses are
>         > known a prior, then isn't it better to use TCP as a
>         transport layer?
>         > With heartbeat, there is a way to configure nodes before
>         starting
>         > cluster. Is TCP ever considered in corosync?
>         >
>         
>         
>         Not sure why tcp would be a better transport layer for two
>         nodes.
>         TCP/IP's key driving design factor from Darpa was to remain
>         operational
>         and _mask faults_ (not detect faults) under nuclear attack
>         where many
>         network links and routing systems would be under considerable
>         changing
>         stress.  As a result TCP/IP is very resilient to faulty
>         networks and
>         packet loss but does not provide suitable fault detection.
>          Further
>         there is not automatic node discovery in TCP/IP.  In short,
>         TCP/IP while
>         highly versatile doesn't offer the best characteristics for
>         cluster
>         communication.
>         
>         Finally, Corosync is designed for nway redundant cluster
>         configurations.
>         The 2N model is a simplification of the nway redundant model
>         and we
>         don't provide special behaviors during 2N operation.
>         
>         Regards
>         -steve
> 
> Actually I changed the code to use TCP just for sending a token, it is
> working very well. "very well" means I do not see token timeout any
> more. I know this is an ugly hack! The main reason I want to use TCP
> is I am seeing token lost timeout in heavy load, so the cluster is
> divided for very short time, which caused some problem in my
> application. If I run the system over night, I usually see one or two
> token lost timeout. the easy fix will be increase token timeout, but I
> have a strict requirement on timeout, so I couldn't increase it. So I
> tried to use TCP. After adding TCP transport just for token transmit,
> this timeout does not happen any more.
> 
> The token is transmitted by unicast, so this chagne will work with
> more than two nodes. And I think there will be cases or environment
> this TCP token transmit may be useful or work better. At least it
> solves my case.
> 
> Thanks
> hj
> 
> 

Did you try increasing (from man page): 
       token_retransmits_before_loss_const
              This value identifies  how  many  token  retransmits
should  be
              attempted  before forming a new configuration.  If this
value is
              set, retransmit and hold will be automatically  calculated
from
              retransmits_before_loss and token.

              The default is 4 retransmissions.


_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to