Hi Alex,

CheckpointOpen call failing with SA_AIS_ERR_TIMEOUT   NOT a bug , it is 
expected if you pass  less time out value `timeout = 1000000000`
to saCkptCheckpointOpen(....,timeout ...) call ,when ckpt has very large 
data/section. just increasing timeout will avoids the SA_AIS_ERR_TIMEOUT.

Let us focus on your original issue/scenario, are you able to reproduce 
the  problem with sectionCreationAttributes.expirationTime with 
SA_TIME_ONE_DAY ?

-AVM

On 1/7/2014 1:17 AM, Alex Jones wrote:
> AVM,
>
>     I've been playing around with your test program, and have gotten 
> it to fail.
>
>     I made the following changes:
>
>  1. Change init_dataX to be 1024k bytes, so that you are initializing
>     the section to be 1024k.
>  2. Also, don't start the program on node B until A has finished
>     writing/creating all the sections.
>  3. Before hitting the enter key on node B, wait for the OpenAsync
>     call to finish.
>
>     You might notice the CheckpointOpen call failing now with 
> SA_AIS_ERR_TIMEOUT.  I had to turn this into OpenAsync, and add a 
> thread to process CkptDispatch messages.  This uncovers another bug in 
> OpenAsync.  I've attached the mods to your program here.
>
>    The OpenAsync callback will be called twice, both times with error 
> == SA_AIS_ERR_TIMEOUT.  If I call OpenAsync again when I get this 
> error, the next callback returns success, but the callback gets called 
> twice with success and with two different checkpoint handles!
>
> Alex
>
>
> On 01/06/2014 06:18 AM, A V Mahesh wrote:
>> Hi Alex,
>>
>> I have  created 10K sections  ( please find the attached test
>> application  `Alex_test_node_A_app.c`  & `Alex_test_node_B_app.c ` )
>> with your specified scenario & configuration and I haven't observed any
>> issue with  sections  on another node.
>>
>> Try to reproduce the problem on your setup & let me know the result .
>>
>> One more importent point how much did you configured
>> `sectionCreationAttributes.expirationTime `  ?
>> I configured  SA_TIME_ONE_DAY.
>>
>> Steps to rung the application :
>>
>> ===================================================================================================================
>>
>> Compile :
>>
>> NODE-A# gcc Alex_test_node_A_app.c -o checkpoint_A -lSaCkpt
>> NODE-A# gcc Alex_test_node_B_app.c -o checkpoint_B -lSaCkpt
>>
>>
>> Run :
>>
>> 1) saCkptCheckpointOpen On node A
>>
>> NODE-A# ./checkpoint_A
>>
>> CPSV:CPA:ONsaCkptSectionCreate  Waiting to Create Sections
>> safCkpt=test_checkpoint_name1,safApp=safCkptService....
>> saCkptSectionCreate Press <Enter> key to continue...
>>
>> .
>> 2) saCkptCheckpointOpen() same ckpt On node B
>>
>> NODE-B# ./checkpoint_B
>>
>> CPSV:CPA:ONsaCkptSectionIterationInitialize Waiting to read Sections
>> safCkpt=test_checkpoint_name1,safApp=safCkptService....
>> saCkptActiveReplicaSet saCkptSectionIterationInitialize Press <Enter>
>> key to continue...
>>
>>
>> 3) saCkptSectionCreate() On node A  and read saCkptCheckpointStatusGet()
>>
>> NODE-A#
>>    checkpointStatus.numberOfSections : 10000
>>    checkpointStatus.memoryUsed :756000
>>     checkpointCreationAttributes.creationFlags;10
>>    checkpointCreationAttributes.checkpointSize;10240000
>>    checkpointCreationAttributes.retentionDuration;60000000000
>>    checkpointCreationAttributes.maxSections;10000
>>    checkpointCreationAttributes.maxSectionSize;1024
>>    checkpointCreationAttributes.maxSectionIdSize;64
>>    ================================
>> saCkptCheckpointUnlink / saCkptCheckpointClose / saCkptFinalize Press
>> <Enter> key to continue...
>> saCkptCheckpoint Press <Enter> key to continue...
>>
>>
>> 4) saCkptActiveReplicaSet() & On node B  and saCkptCheckpointStatusGet()
>>
>> NODE-B#
>>    checkpointStatus.numberOfSections : 10000
>>    checkpointStatus.memoryUsed :756000
>>     checkpointCreationAttributes.creationFlags;10
>>    checkpointCreationAttributes.checkpointSize;10240000
>>    checkpointCreationAttributes.retentionDuration;60000000000
>>    checkpointCreationAttributes.maxSections;10000
>>    checkpointCreationAttributes.maxSectionSize;1024
>>    checkpointCreationAttributes.maxSectionIdSize;64
>>
>>    saCkptCheckpointUnlink / saCkptCheckpointClose / saCkptFinalize Press
>> <Enter> key to continue...
>>    saCkptCheckpoint Press <Enter> key to continue..
>>
>> ================================================================================================================================
>>
>> -AVM
>>
>>
>> On 1/6/2014 12:32 PM, A V Mahesh wrote:
>>> Hi Alex,
>>>
>>> We never tested the  7500 sections , will test & and let you know ,
>>> can you please share your test application ,
>>>   that allow us to respond quick.
>>>
>>> -AVM
>>>
>>> On 1/3/2014 8:23 PM, Alex Jones wrote:
>>>> Hello All,
>>>>
>>>>       I'm experimenting with the checkpoint service, and some things
>>>> don't appear to work.
>>>>
>>>>       The saCkptActiveReplicaSet and
>>>> saCkptCheckpointSynchronize[Async] don't appear to work when the
>>>> checkpoint has section numbers greater than around 5500.
>>>>
>>>>       I've created a checkpoint with 7500 sections, each section being
>>>> 1024 bytes.  The checkpoint is co-located and the "active replica"
>>>> bit is set.
>>>>
>>>>       I can create and write all the sections.  And from another node
>>>> I run saCkptCheckpointStatusGet, and the information all looks good.
>>>> Everything is there.  I see no errors from any CKPT API calls.
>>>>
>>>>       The problem comes when I call saCkptActiveReplicaSet from this
>>>> other node.  After I do this, saCkptCheckpointStatusGet now returns
>>>> all the same information except the number of sections is no longer
>>>> 7500 but 0.  If I do this test with 50,000 sections only about 3,000
>>>> entries get synced.  And iterating through the sections shows that
>>>> there are only 3,000 sections.
>>>>
>>>>       Calling saCkptCheckpointSynchronize[Async] in this situation has
>>>> no effect, either.
>>>>
>>>>       After looking through the code I see a comment in
>>>> cpnd_evt_proc_ckpt_arep_set that says "/* ###TBD sync up is missing
>>>> with old active if now this fellow is becoming active. */"  So, it
>>>> doesn't appear that syncing is being done in the
>>>> saCkptActiveReplicaSet, which it should be.
>>>>
>>>>       Can someone comment?
>>>>
>>>>       I'm going to fix this and post a patch unless someone else is
>>>> already working on it, but I didn't see a bug for it.
>>>>
>>>> Alex
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> Rapidly troubleshoot problems before they affect your business. Most IT
>>>> organizations don't have a clear picture of how application performance
>>>> affects their revenue. With AppDynamics, you get 100% visibility into
>>>> your
>>>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of
>>>> AppDynamics Pro!
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>>>>   
>>>>
>>>> _______________________________________________
>>>> Opensaf-devel mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to