Hi Alex, CheckpointOpen call failing with SA_AIS_ERR_TIMEOUT NOT a bug , it is expected if you pass less time out value `timeout = 1000000000` to saCkptCheckpointOpen(....,timeout ...) call ,when ckpt has very large data/section. just increasing timeout will avoids the SA_AIS_ERR_TIMEOUT.
Let us focus on your original issue/scenario, are you able to reproduce the problem with sectionCreationAttributes.expirationTime with SA_TIME_ONE_DAY ? -AVM On 1/7/2014 1:17 AM, Alex Jones wrote: > AVM, > > I've been playing around with your test program, and have gotten > it to fail. > > I made the following changes: > > 1. Change init_dataX to be 1024k bytes, so that you are initializing > the section to be 1024k. > 2. Also, don't start the program on node B until A has finished > writing/creating all the sections. > 3. Before hitting the enter key on node B, wait for the OpenAsync > call to finish. > > You might notice the CheckpointOpen call failing now with > SA_AIS_ERR_TIMEOUT. I had to turn this into OpenAsync, and add a > thread to process CkptDispatch messages. This uncovers another bug in > OpenAsync. I've attached the mods to your program here. > > The OpenAsync callback will be called twice, both times with error > == SA_AIS_ERR_TIMEOUT. If I call OpenAsync again when I get this > error, the next callback returns success, but the callback gets called > twice with success and with two different checkpoint handles! > > Alex > > > On 01/06/2014 06:18 AM, A V Mahesh wrote: >> Hi Alex, >> >> I have created 10K sections ( please find the attached test >> application `Alex_test_node_A_app.c` & `Alex_test_node_B_app.c ` ) >> with your specified scenario & configuration and I haven't observed any >> issue with sections on another node. >> >> Try to reproduce the problem on your setup & let me know the result . >> >> One more importent point how much did you configured >> `sectionCreationAttributes.expirationTime ` ? >> I configured SA_TIME_ONE_DAY. >> >> Steps to rung the application : >> >> =================================================================================================================== >> >> Compile : >> >> NODE-A# gcc Alex_test_node_A_app.c -o checkpoint_A -lSaCkpt >> NODE-A# gcc Alex_test_node_B_app.c -o checkpoint_B -lSaCkpt >> >> >> Run : >> >> 1) saCkptCheckpointOpen On node A >> >> NODE-A# ./checkpoint_A >> >> CPSV:CPA:ONsaCkptSectionCreate Waiting to Create Sections >> safCkpt=test_checkpoint_name1,safApp=safCkptService.... >> saCkptSectionCreate Press <Enter> key to continue... >> >> . >> 2) saCkptCheckpointOpen() same ckpt On node B >> >> NODE-B# ./checkpoint_B >> >> CPSV:CPA:ONsaCkptSectionIterationInitialize Waiting to read Sections >> safCkpt=test_checkpoint_name1,safApp=safCkptService.... >> saCkptActiveReplicaSet saCkptSectionIterationInitialize Press <Enter> >> key to continue... >> >> >> 3) saCkptSectionCreate() On node A and read saCkptCheckpointStatusGet() >> >> NODE-A# >> checkpointStatus.numberOfSections : 10000 >> checkpointStatus.memoryUsed :756000 >> checkpointCreationAttributes.creationFlags;10 >> checkpointCreationAttributes.checkpointSize;10240000 >> checkpointCreationAttributes.retentionDuration;60000000000 >> checkpointCreationAttributes.maxSections;10000 >> checkpointCreationAttributes.maxSectionSize;1024 >> checkpointCreationAttributes.maxSectionIdSize;64 >> ================================ >> saCkptCheckpointUnlink / saCkptCheckpointClose / saCkptFinalize Press >> <Enter> key to continue... >> saCkptCheckpoint Press <Enter> key to continue... >> >> >> 4) saCkptActiveReplicaSet() & On node B and saCkptCheckpointStatusGet() >> >> NODE-B# >> checkpointStatus.numberOfSections : 10000 >> checkpointStatus.memoryUsed :756000 >> checkpointCreationAttributes.creationFlags;10 >> checkpointCreationAttributes.checkpointSize;10240000 >> checkpointCreationAttributes.retentionDuration;60000000000 >> checkpointCreationAttributes.maxSections;10000 >> checkpointCreationAttributes.maxSectionSize;1024 >> checkpointCreationAttributes.maxSectionIdSize;64 >> >> saCkptCheckpointUnlink / saCkptCheckpointClose / saCkptFinalize Press >> <Enter> key to continue... >> saCkptCheckpoint Press <Enter> key to continue.. >> >> ================================================================================================================================ >> >> -AVM >> >> >> On 1/6/2014 12:32 PM, A V Mahesh wrote: >>> Hi Alex, >>> >>> We never tested the 7500 sections , will test & and let you know , >>> can you please share your test application , >>> that allow us to respond quick. >>> >>> -AVM >>> >>> On 1/3/2014 8:23 PM, Alex Jones wrote: >>>> Hello All, >>>> >>>> I'm experimenting with the checkpoint service, and some things >>>> don't appear to work. >>>> >>>> The saCkptActiveReplicaSet and >>>> saCkptCheckpointSynchronize[Async] don't appear to work when the >>>> checkpoint has section numbers greater than around 5500. >>>> >>>> I've created a checkpoint with 7500 sections, each section being >>>> 1024 bytes. The checkpoint is co-located and the "active replica" >>>> bit is set. >>>> >>>> I can create and write all the sections. And from another node >>>> I run saCkptCheckpointStatusGet, and the information all looks good. >>>> Everything is there. I see no errors from any CKPT API calls. >>>> >>>> The problem comes when I call saCkptActiveReplicaSet from this >>>> other node. After I do this, saCkptCheckpointStatusGet now returns >>>> all the same information except the number of sections is no longer >>>> 7500 but 0. If I do this test with 50,000 sections only about 3,000 >>>> entries get synced. And iterating through the sections shows that >>>> there are only 3,000 sections. >>>> >>>> Calling saCkptCheckpointSynchronize[Async] in this situation has >>>> no effect, either. >>>> >>>> After looking through the code I see a comment in >>>> cpnd_evt_proc_ckpt_arep_set that says "/* ###TBD sync up is missing >>>> with old active if now this fellow is becoming active. */" So, it >>>> doesn't appear that syncing is being done in the >>>> saCkptActiveReplicaSet, which it should be. >>>> >>>> Can someone comment? >>>> >>>> I'm going to fix this and post a patch unless someone else is >>>> already working on it, but I didn't see a bug for it. >>>> >>>> Alex >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> Rapidly troubleshoot problems before they affect your business. Most IT >>>> organizations don't have a clear picture of how application performance >>>> affects their revenue. With AppDynamics, you get 100% visibility into >>>> your >>>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of >>>> AppDynamics Pro! >>>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk >>>> >>>> >>>> _______________________________________________ >>>> Opensaf-devel mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel > ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
