On Mon, 2009-04-13 at 10:38 +0800, jay_chen wrote: > Dear All: > > I am using 0.80.4 and I encounter a problem. > I have two nodes on different devices and I checkpoint data from > master to slave node. > After they run for a long time, it always return > SA_AIS_ERR_TRY_AGAIN on master. (which is doing checkpoint write > operation periodically) > > Some observations: > 1. sync_in_process( ) is always 1. > 2. master & slave call sync_barrier_send( ) and return with res = 0 > (which means ok) > 3. master call sync_deliver_fn( ) with slave nodeid only > 4. slave call sync_deliver_fn with slave nodeid only > 5. the state machine of sync remains in processing state forever > > Could anyody give me some hints to further investigate? > logs around the time this happens would be helpful.
Is the THROW_AWAY code in totempg throwing away the first message in the new configuration? what service is syncing when this barrier occurs? what io load is taking place (transactions per second). regards -steve > Thanks. > > Jay Chen. > _______________________________________________ > Openais mailing list > [email protected] > https://lists.linux-foundation.org/mailman/listinfo/openais _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
