Jay, I would recommend trying latest whitetank tip from svn. It fixes various bugs with checkpoint syncronization which could cause the problem you describe.
Regards -steve On Wed, 2009-03-25 at 16:20 +0800, jay_chen wrote: > Dear All: > > I would like to know had this issue ever been fixed? > I ask so because I encounter one problem looks like this. > (0.80.3 and 0.80.4, with high network traffic load) Thanks. > > Jay... > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] > Sent: Wednesday, October 03, 2007 5:25 AM > To: [email protected] > Subject: [Spam Mail]Re: [Openais] checkpoint disappears after node reset > [This message is to be blocked by code: bkdh372119] > > I found out that ckpt_sync_activate() caused the checkpoint to be released. > Does it give you an idea to come up with a patch? I am hacking here and > there trying to work around the sync. > > Henry > > > Jun 23 13:50:38.380459 [CKPT ] Got EXEC request to close checkpoint > switchdrvr <=== switchover Jun 23 13:50:38.380575 [CKPT ] Close > checkpoint->reference_count 0 <=== retention timer not > started as we used SA_TIME_END, chkpt not release yet Jun 23 13:50:39.545654 > [MAIN ] entering GATHER state from 12. > Jun 23 13:50:39.689268 [MAIN ] got commit token Jun 23 13:50:39.689395 [MAIN > ] Saving state aru 3917 high seq received 3917 Jun 23 13:50:39.689560 [MAIN > ] entering COMMIT state. > Jun 23 13:50:39.690014 [CKPT ] Library request to open checkpoint. > Jun 23 13:50:39.690286 [MAIN ] got commit token Jun 23 13:50:39.690350 [MAIN > ] entering RECOVERY state. > Jun 23 13:50:39.690529 [MAIN ] position [0] member > 192.168.18.1: > Jun 23 13:50:39.690602 [MAIN ] previous ring seq 12 rep 192.168.18.1 Jun 23 > 13:50:39.690664 [MAIN ] aru 9 high delivered 9 received flag 0 Jun 23 > 13:50:39.690741 [MAIN ] position [1] member > 192.168.18.2: > Jun 23 13:50:39.690809 [MAIN ] previous ring seq 8 rep > 192.168.18.1 > Jun 23 13:50:39.690878 [MAIN ] aru 3917 high delivered > 3917 received flag 0 > Jun 23 13:50:39.690955 [MAIN ] Did not need to originate any messages in > recovery. > Jun 23 13:50:39.691072 [MAIN ] Storing new sequence id for ring 10 Jun 23 > 13:50:39.691247 [MAIN ] got commit token > > > Jun 23 13:50:39.698345 [CLM ] r(0) ip(192.168.18.1) Jun 23 13:50:39.698479 > [CLM ] r(0) ip(192.168.18.2) Jun 23 13:50:39.699078 [CLM ] Members Left: > Jun 23 13:50:39.699153 [CLM ] Members Joined: > Jun 23 13:50:39.699224 [EVT ] Evt conf change 0 Jun 23 13:50:39.699279 [EVT > ] m 2, j 0, l 0 Jun 23 13:50:39.699349 [SYNC ] This node is within the > primary component and will provide service. > Jun 23 13:50:39.699522 [MAIN ] entering OPERATIONAL state. > Jun 23 13:50:39.703321 [CKPT ] Executive request to open checkpoint > 0x7faba400 <=== Jun 23 13:50:39.703410 [CKPT ] CHECKPOINT opened is > 0x105589d8 > Jun 23 13:50:39.705165 [CLM ] got nodejoin message > 192.168.18.1 > Jun 23 13:50:39.705509 [CLM ] got nodejoin message > 192.168.18.2 > Jun 23 13:50:39.712034 [CKPT ] > checkpoint_section_release expiration timer = 0x(nil) <=== checkpoint and > its sections are released Jun 23 13:50:39.712167 [CKPT ] > checkpoint_section_release expiration timer = 0x(nil) Jun 23 13:50:39.712240 > [CKPT ] checkpoint_section_release expiration timer = 0x(nil) Jun 23 > 13:50:39.712303 [CKPT ] checkpoint_section_release expiration timer = > 0x(nil) > > > > > > ____________________________________________________________________________ > ________ > Need a vacation? Get great deals > to amazing places on Yahoo! Travel. > http://travel.yahoo.com/ > _______________________________________________ > Openais mailing list > [email protected] > https://lists.linux-foundation.org/mailman/listinfo/openais > > _______________________________________________ > Openais mailing list > [email protected] > https://lists.linux-foundation.org/mailman/listinfo/openais _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
