There are no errors returned from the osaf calls, there are no error logs generated.
What problems would writing to the checkpoint in an AMF callback cause? — tony > On Jan 7, 2016, at 10:23 PM, A V Mahesh <[email protected]> wrote: > > > One more thing ideally writing checkpoint kind of operation are NOT > suggested in CALLBACKs > what exactly is your requirement ? > > Did error handling is done properly ? > > -AVM > > On 1/8/2016 8:49 AM, A V Mahesh wrote: >> Hi, >> >>>> (so the standby code does not run until the active code is done). >> >> If above the sequence of checkpoint writing , you should be having >> problem even with 40MB and higher , >> can you please cross check any system limitation such as /dev/shm/ .. >> ect. >> >> By the way which Opensaf change set you are using ? >> >> -AVM >> >> >> On 1/8/2016 1:25 AM, Tony Hart wrote: >>> Should also mention that this is using the synchronous API calls. >>> >>>> On Jan 7, 2016, at 10:55 AM, Tony Hart <[email protected]> wrote: >>>> >>>> OpenSAF 4.5.1 >>>> >>>> We’re seeing an issue where checkpoints are not syncing between two >>>> nodes (the data in one is different from the other). There are two >>>> separate nodes (A and B) one will have the active instance of the >>>> process and the other the standby instance. The checkpoint is >>>> created, opened and initialized in the active instance’s AMF ACTIVE >>>> callback. Then the checkpoint is opened in the standby instances >>>> AMF standby callback (so the standby code does not run until the >>>> active code is done). >>>> >>>> NodeA >>>> on_active() { >>>> >>>> Create a checkpoint with (SA_CKPT_WR_ALL_REPLICAS | >>>> SA_CKPT_CHECKPOINT_COLLOCATED) >>>> Initialize the checkpoint data (first 32 bytes is filled with a >>>> pattern) >>>> } >>>> >>>> NodeB >>>> on_standby() { >>>> Open the same checkpoint >>>> Read first 32 bytes and check for fill pattern. >>>> } >>>> >>>> On NodeB what we occasionally see is that the check fails, instead >>>> of reading the fill pattern it see's zeros. It doesn’t matter how >>>> long the checkpoint is left open we never see the fill pattern. >>>> >>>> Hear is a dump of the shared memory file from the two nodes. Our >>>> data starts at 06448 (0xf33d). You can see on the standby copy that >>>> its zero. >>>> >>>> Other checkpoints work fine. The difference with this one is that >>>> its much bigger than the others ~20MB, if we increase the size of >>>> the checkpoint to 40MB we see the failure all the time. So the >>>> problem seems to be related to the size of the checkpoint. >>>> >>>> NodeA (active) >>>> $ od -x -N 512 /dev/shm/opensaf_safCkpt\=SwitchMgr__69391_13 >>>> 0000000 000d 0000 0000 0000 0013 6173 4366 706b >>>> 0000020 3d74 7753 7469 6863 674d 5f72 0035 0000 >>>> 0000040 0000 0000 0000 0000 0000 0000 0000 0000 >>>> * >>>> 0000420 0009 0000 0000 0000 0020 02bc 0000 0000 >>>> 0000440 5800 f847 000d 0000 0001 0000 0000 0000 >>>> 0000460 0020 02bc 0000 0000 001a 0000 0000 0000 >>>> 0000500 0004 0000 0000 0000 0000 0000 0001 0000 >>>> 0000520 0000 0000 0101 0000 a031 bc91 0f0f 0001 >>>> 0000540 0000 0000 0000 0000 0000 0000 0000 0000 >>>> * >>>> 0000600 0000 0000 0000 0000 0000 0000 0001 0000 >>>> 0000620 0020 02bc 0000 0000 0000 0000 0000 0000 >>>> 0000640 7f01 568e 0000 0000 f33d b33f 0578 0000 >>>> 0000660 8000 0000 0000 0000 0000 0000 0000 0000 >>>> 0000700 0000 0000 0000 0000 0000 0000 0000 0000 >>>> * >>>> 0001000 >>>> >>>> >>>> NodeB (standby) >>>> $ od -x -N 512 /dev/shm/opensaf_safCkpt\=SwitchMgr__69647_13 >>>> 0000000 000d 0000 0000 0000 0013 6173 4366 706b >>>> 0000020 3d74 7753 7469 6863 674d 5f72 0035 0000 >>>> 0000040 0000 0000 0000 0000 0000 0000 0000 0000 >>>> * >>>> 0000420 0009 0000 0000 0000 0020 02bc 0000 0000 >>>> 0000440 5800 f847 000d 0000 0001 0000 0000 0000 >>>> 0000460 0020 02bc 0000 0000 001a 0000 0000 0000 >>>> 0000500 0004 0000 0000 0000 0000 0000 0001 0000 >>>> 0000520 0000 0000 0101 0000 a031 bc91 0f0f 0001 >>>> 0000540 0000 0000 0000 0000 0000 0000 0000 0000 >>>> * >>>> 0000600 0000 0000 0000 0000 0000 0000 0001 0000 >>>> 0000620 0000 0000 0000 0000 0000 0000 0000 0000 >>>> * >>>> 0001000 >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> >>>> _______________________________________________ >>>> Opensaf-users mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/opensaf-users >>> ------------------------------------------------------------------------------ >>> >>> >>> _______________________________________________ >>> Opensaf-users mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/opensaf-users >> > > > ------------------------------------------------------------------------------ > _______________________________________________ > Opensaf-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensaf-users ------------------------------------------------------------------------------ _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
