Hi Anders Widell Ok I will bump the minor version of the CKPT API, or to use ELF symbol version as you suggested.
-AVM On 12/29/2015 3:49 PM, Anders Widell wrote: > Hi! > > See my comments marked [AndersW] inline. > > regards, > Anders Widell > > On 12/28/2015 05:35 AM, A V Mahesh wrote: >> Hi Anders Widell, >> >> Please find my answers. >> >> On 12/22/2015 10:41 PM, Anders Widell wrote: >>> See comments inline. >>> >>> regards, >>> Anders Widell >>> >>> On 12/21/2015 11:04 AM, A V Mahesh wrote: >>>> Hi, >>>> >>>> - The timeout here is the duration for which the API should wait >>>> before returning to the user. >>>> - The patch does not changes the minimum value of 14 seconds. IE. >>>> Before this patch also the default value was 14 seconds. >>>> This value was set based on implementation specific fine tuning, >>>> particularly in large size cluster. >>> I think it does change the minimum timeout. I am here referring to >>> the functions that take an SaTimeT timeout parameter. You can try >>> this example program, that used to work before ticket [#1583] was >>> pushed: >>> >>> const SaNameT checkpointName = {strlen("safCkpt=hello"), >>> "safCkpt=hello"}; >>> do { >>> err = saCkptCheckpointOpen( >>> ckptHandle, >>> &checkpointName, >>> &checkpointCreationAttributes, >>> SA_CKPT_CHECKPOINT_READ | >>> SA_CKPT_CHECKPOINT_WRITE | >>> SA_CKPT_CHECKPOINT_CREATE, >>> 10 * SA_TIME_ONE_SECOND, >>> &checkpointHandle); >>> >> [AVM] >> >> Even though previously ( before this patch) >> Application was passing 10 * SA_TIME_ONE_SECOND ( ) >> because of #1583 reported real issue of `SaTimeT >> timeout parameter is not being converted in to TEN_MILLI_SEC() while >> passing to mds_msg_sync_send()` >> internally CPA was assuming Application is passed >> 10 SECOND * 10000000 ( 10 * 1000000000 * 10000000) ,which is being >> treated as larger than the default >> timeout time of 14 SECOND, so previously the >> Application only has an impression that it will get time out in 10 >> SECOND , but in piratical it was considered larger >> than 14 SECOND default time. > [AndersW] Yes this is a bug and I am not arguing against fixing it. I > don't think a fix for this would be backwards incompatible, since an > application shouldn't depend on a bug in OpenSAF (we don't need to be > bug-backwards-compatible, except perhaps in some unusual situations). > But this is not the problem that I am talking about. The problem I am > talking about is that the range of allowed timeout values has become > smaller as part of the fix. Such a change is typically not backwards > compatible, since an application could be using a timeout value which > lies within the previously allowed range of timeout values, but which > is outside the new (smaller) range of allowed values. The example > program I provided demonstrates this problem. It works before [#1583] > was pushed, but not afterwards. The program uses the checkpoint API in > a perfectly valid and reasonable way. Thus, it provides a clear > evidence that the change is not backwards compatible. If you don't > believe me, please try the program for yourself. >> >> ================================================================================================================= >> >> >> Before #1583 Fix : >> saCkptCheckpointOpen( ckptHandle, >> &checkpointName, ....., 10 * SA_TIME_ONE_SECOND, &checkpointHandle); >> timeout = 10 * SA_TIME_ONE_SECOND; >> proc_rc = cpa_mds_msg_sync_send(cb->cpa_mds_hdl, >> &(cb->cpnd_mds_dest), &evt, &out_evt, timeout); >> >> After #1583 Fix : >> timeout = 10 * SA_TIME_ONE_SECOND; >> time_out = >> m_CPSV_CONVERT_SATIME_TEN_MILLI_SEC(timeout); >> proc_rc = cpa_mds_msg_sync_send(cb->cpa_mds_hdl, >> &(cb->cpnd_mds_dest), &evt, &out_evt, time_out); >> >> ================================================================================================================= >> >> >> >> In other words , before this patch in reality the >> saCkptCheckpointOpen() was not being returned TIMEOUT even after 10 >> SECOND ( but Application expecting it). >> Application was only in assumption it was passing 10 >> SECOND , so no change in behavioral difference in Application before >> and after this patch. > [AndersW] As I said above, this is not the part of the change that I > have problems with. It is the following passage: > > Before (note: NCS_SAF_MIN_ACCEPT_TIME has the value 10): > > if (time_out < NCS_SAF_MIN_ACCEPT_TIME) { > rc = SA_AIS_ERR_TIMEOUT; > TRACE_4("Cpa CkptOpen failed Api failed with return > value:%d,ckptHandle:%llx,timeout:%llu", rc, ckptHandle, timeout); > goto mds_send_fail; > } > > > After (note: CPSV_WAIT_TIME has the value 1400): > > if (time_out < CPSV_WAIT_TIME) { > rc = SA_AIS_ERR_TIMEOUT; > TRACE_4("Cpa CkptOpen failed Api failed with return > value:%d,ckptHandle:%llx,timeout:%llu", > rc, ckptHandle, timeout); > goto mds_send_fail; > } > > So yes, there is a change in behaviour from the application's point of > view. If the timeout (after conversion to a unit of 10 milliseconds) > falls within the range 10 to 1400, the application will now always get > the SA_AIS_ERR_TIMEOUT. Previously, the function may have succeeded. >> >>> >>>> - Even though it is not a practical case of passing 49 days >>>> (greater than or equal to 0x98968000000000) as a timeout value, yes >>>> what you say can be done across all services. >>> Shouldn't we then document 0x98968000000000 as the upper limit for >>> the timeout parameter, and validate that the parameter given by the >>> user is less than this upper limit. As it is coded right now, the >>> function will unconditionally return SA_AIS_ERR_TIMEOUT if you >>> specify 0x98968000000000 as timeout (you can try this by replacing >>> 10 * SA_TIME_ONE_SECOND with 0x98968000000000 in the example program >>> above). >> >> [AVM] Ok ,as if this issue exist with all service , I will raise a >> new ticket and will be fixed/document for all services point of view . > [AndersW] Instead of documenting this as a limitation, please fix MDS > by changing the size of the timeout parameter from 32 bits to 64 bits. > Note that there are perfectly reasonable timeout values that are very > large, e.g. someone might be using SA_TIME_MAX to get a (for all > practical purposes) infinite timeout. >> >> -AVM >>>> >>>> Thanks, >>>> AV. >>>> >>>> >>>> On 12/18/2015 2:43 PM, Anders Widell wrote: >>>>> Hi! >>>>> >>>>> I have some questions regarding this change: >>>>> >>>>> 1) This change is not backwards compatible. Have you considered to >>>>> either bump the minor version of the CKPT API, or to use ELF >>>>> symbol versioning, to avoid breaking old applications? >>>>> >>>>> 2) Why was the minimum possible time-out increased from 0.1 >>>>> seconds to 14 seconds? 14 seconds is like an eternity in a >>>>> real-time system. >>>>> >>>>> 3) If the user-specified timeout is less than 14 seconds, the >>>>> functions will always fail without even trying. Wouldn't it be >>>>> better to at least try to perform the operation within the time >>>>> limit specified by the user? The user may have a loop where it >>>>> keeps retrying the API function call when the SA_AIS_ERR_TIMEOUT >>>>> error code is returned. I would expect such a loop to eventually >>>>> terminate, especially if the timeout parameter has a reasonable >>>>> value (and there are many reasonable timeout values which are less >>>>> than 14 seconds). >>>>> >>>>> 3) Can't the conversion from nanoseconds represented as a signed >>>>> 64-bit integer to a unit of 10 milliseconds stored in an unsigned >>>>> 32-bit integer ('uint32_t time_out' in the code below) overflow? >>>>> I.e. if the timeout parameter has a value greater than or equal to >>>>> 0x98968000000000? >>>>> >>>>> / Anders Widell >>>>> >>>>> On 11/06/2015 06:33 AM, mahesh.va...@oracle.com wrote: >>>>>> osaf/libs/agents/saf/cpa/cpa_api.c | 111 >>>>>> ++++++++++++++++++++++++++++++------ >>>>>> 1 files changed, 91 insertions(+), 20 deletions(-) >>>>>> >>>>>> >>>>>> If we test upper and lower limits of SaTimeT timeout arguments to >>>>>> CPSV API , >>>>>> some of CPA APIs are doesn't returns or hungs. >>>>>> >>>>>> Current cpa code doesn't validating or passing SaTimeT timeout >>>>>> arguments to CPSV API , >>>>>> which are below default API timeout limit of CPSV_WAIT_TIME (1400 >>>>>> X 10ms) and >>>>>> above SaTimeT, type long long int Maximum value >>>>>> 9223372036854775807 (2^63-1) >>>>>> >>>>>> For example : >>>>>> >>>>>> i) const SaTimeT timeout = 1215752192; >>>>>> ii) const SaTimeT timeout = 2 * SA_TIME_ONE_MINUTE; >>>>>> iii) const SaTimeT timeout = 9223372036854775809; >>>>>> >>>>>> saCkptCheckpointOpen(...., timeout, ...); >>>>>> >>>>>> Because of that in some of variables are over flowing and , API >>>>>> calls are not working properly >>>>>> >>>>>> Fix : >>>>>> >>>>>> 1 - Validated if (time_out < CPSV_WAIT_TIME) return >>>>>> SA_AIS_ERR_INVALID_PARAM >>>>>> 2 - Validates if (expirationTime < 0 ) return >>>>>> SA_AIS_ERR_INVALID_PARAM >>>>>> 3 - Pass proper timeout values to internal functions of Opensaf >>>>>> >>>>>> diff --git a/osaf/libs/agents/saf/cpa/cpa_api.c >>>>>> b/osaf/libs/agents/saf/cpa/cpa_api.c >>>>>> --- a/osaf/libs/agents/saf/cpa/cpa_api.c >>>>>> +++ b/osaf/libs/agents/saf/cpa/cpa_api.c >>>>>> @@ -27,7 +27,6 @@ >>>>>> ******************************************************************************/ >>>>>> >>>>>> >>>>>> #include "cpa.h" >>>>>> -#define NCS_SAF_MIN_ACCEPT_TIME 10 >>>>>> /**************************************************************************** >>>>>> >>>>>> >>>>>> Name : SaCkptInitialize >>>>>> @@ -889,6 +888,14 @@ SaAisErrorT saCkptCheckpointOpen(SaCkptH >>>>>> return SA_AIS_ERR_INVALID_PARAM; >>>>>> } >>>>>> + if (timeout < 0 ) { >>>>>> + TRACE_4("Cpa CkptOpen: failed invalid timeout return >>>>>> value:%d,ckptHandle:%llx", >>>>>> + SA_AIS_ERR_INVALID_PARAM, ckptHandle); >>>>>> + TRACE_LEAVE2("API return code = %u", rc); >>>>>> + return SA_AIS_ERR_INVALID_PARAM; >>>>>> + } >>>>>> + >>>>>> + >>>>>> /* Draft Validations */ >>>>>> rc = cpa_open_attr_validate(checkpointCreationAttributes, >>>>>> checkpointOpenFlags, checkpointName); >>>>>> if (rc != SA_AIS_OK) { >>>>>> @@ -975,13 +982,13 @@ SaAisErrorT saCkptCheckpointOpen(SaCkptH >>>>>> } >>>>>> evt.info.cpnd.info.openReq.ckpt_flags = checkpointOpenFlags; >>>>>> - >>>>>> /* convert the timeout to 10 ms value and add it to the >>>>>> sync send timeout */ >>>>>> time_out = m_CPSV_CONVERT_SATIME_TEN_MILLI_SEC(timeout); >>>>>> - if (time_out < NCS_SAF_MIN_ACCEPT_TIME) { >>>>>> + if (time_out < CPSV_WAIT_TIME) { >>>>>> rc = SA_AIS_ERR_TIMEOUT; >>>>>> - TRACE_4("Cpa CkptOpen failed Api failed with return >>>>>> value:%d,ckptHandle:%llx,timeout:%llu", rc, ckptHandle, timeout); >>>>>> + TRACE_4("Cpa CkptOpen failed Api failed with return >>>>>> value:%d,ckptHandle:%llx,timeout:%llu", >>>>>> + rc, ckptHandle, timeout); >>>>>> goto mds_send_fail; >>>>>> } >>>>>> @@ -998,9 +1005,8 @@ SaAisErrorT saCkptCheckpointOpen(SaCkptH >>>>>> TRACE_4("cpa CkptOpen:CPND_DOWN Api failed with return >>>>>> value:%d,ckptHandle:%llx", rc, ckptHandle); >>>>>> goto mds_send_fail; >>>>>> } >>>>>> - >>>>>> /* Send the evt to CPND */ >>>>>> - proc_rc = cpa_mds_msg_sync_send(cb->cpa_mds_hdl, >>>>>> &(cb->cpnd_mds_dest), &evt, &out_evt, timeout); >>>>>> + proc_rc = cpa_mds_msg_sync_send(cb->cpa_mds_hdl, >>>>>> &(cb->cpnd_mds_dest), &evt, &out_evt, time_out); >>>>>> /* Generate rc from proc_rc */ >>>>>> switch (proc_rc) { >>>>>> @@ -1709,6 +1715,13 @@ SaAisErrorT saCkptCheckpointRetentionDur >>>>>> goto done; >>>>>> } >>>>>> + if (retentionDuration < 0 ) { >>>>>> + TRACE_4("Cpa RetentionDurationSet: failed >>>>>> invalid retentionDuration return value:%d,ckptHandle:%llx", >>>>>> + SA_AIS_ERR_INVALID_PARAM, checkpointHandle); >>>>>> + rc = SA_AIS_ERR_INVALID_PARAM; >>>>>> + goto done; >>>>>> + } >>>>>> + >>>>>> /* get the CB Lock */ >>>>>> if (m_NCS_LOCK(&cb->cb_lock, NCS_LOCK_WRITE) != >>>>>> NCSCC_RC_SUCCESS) { >>>>>> rc = SA_AIS_ERR_LIBRARY; >>>>>> @@ -2183,6 +2196,7 @@ SaAisErrorT saCkptSectionCreate(SaCkptCh >>>>>> bool gen_sec_flag = false; >>>>>> SaCkptSectionIdT app_ptr; >>>>>> CPA_CLIENT_NODE *cl_node = NULL; >>>>>> + uint32_t time_out; >>>>>> TRACE_ENTER2("SaCkptCheckpointHandleT passed is >>>>>> %llx",checkpointHandle); >>>>>> /* Validate the Input Parameters */ >>>>>> @@ -2312,8 +2326,21 @@ SaAisErrorT saCkptSectionCreate(SaCkptCh >>>>>> evt.info.cpnd.info.sec_creatReq.init_size = initialDataSize; >>>>>> m_NCS_UNLOCK(&cb->cb_lock, NCS_LOCK_WRITE); >>>>>> + >>>>>> + time_out = CPA_WAIT_TIME(initialDataSize); >>>>>> + >>>>>> + if (time_out < 0 ) { >>>>>> + rc = SA_AIS_ERR_TIMEOUT; >>>>>> + TRACE_4("cpa SectCreate Api failed because of larg >>>>>> initialDataSize return :%d,ckptHandle:%llx", >>>>>> + rc, checkpointHandle); >>>>>> + goto fail1; >>>>>> + } >>>>>> + else if (time_out < CPSV_WAIT_TIME) { >>>>>> + time_out = CPSV_WAIT_TIME; >>>>>> + } >>>>>> + >>>>>> proc_rc = cpa_mds_msg_sync_send(cb->cpa_mds_hdl, >>>>>> &(gc_node->active_mds_dest), >>>>>> - &evt, &out_evt, >>>>>> CPA_WAIT_TIME(initialDataSize)); >>>>>> + &evt, &out_evt, time_out); >>>>>> /* Generate rc from proc_rc */ >>>>>> switch (proc_rc) { >>>>>> @@ -2684,6 +2711,14 @@ SaAisErrorT saCkptSectionExpirationTimeS >>>>>> return rc; >>>>>> } >>>>>> + if (expirationTime < 0 ) { >>>>>> + TRACE_4("Cpa ExpirationTimeSet: failed invalid >>>>>> expirationTime return value:%d,ckptHandle:%llx", >>>>>> + SA_AIS_ERR_INVALID_PARAM, checkpointHandle); >>>>>> + rc = SA_AIS_ERR_INVALID_PARAM; >>>>>> + TRACE_LEAVE2("API return code = %u", rc); >>>>>> + return rc; >>>>>> + } >>>>>> + >>>>>> /* retrieve CPA CB */ >>>>>> m_CPA_RETRIEVE_CB(cb); >>>>>> if (!cb) { >>>>>> @@ -2847,6 +2882,14 @@ SaAisErrorT saCkptSectionIterationInitia >>>>>> return rc; >>>>>> } >>>>>> + if (expirationTime < 0 ) { >>>>>> + TRACE_4("Cpa SectionIterationInitialize: failed invalid >>>>>> expirationTime return value:%d,ckptHandle:%llx", >>>>>> + SA_AIS_ERR_INVALID_PARAM, checkpointHandle); >>>>>> + rc = SA_AIS_ERR_INVALID_PARAM; >>>>>> + TRACE_LEAVE2("API return code = %u", rc); >>>>>> + return rc; >>>>>> + } >>>>>> + >>>>>> /* retrieve CPA CB */ >>>>>> m_CPA_RETRIEVE_CB(cb); >>>>>> if (!cb) { >>>>>> @@ -3360,6 +3403,7 @@ SaAisErrorT saCkptCheckpointWrite(SaCkpt >>>>>> CPA_CLIENT_NODE *cl_node = NULL; >>>>>> SaSizeT all_ioVector_size = 0; >>>>>> uint32_t err_flag = 0; >>>>>> + uint32_t time_out; >>>>>> >>>>>> TRACE_ENTER2("SaCkptCheckpointHandleT passed is >>>>>> %llx",checkpointHandle); >>>>>> @@ -3479,18 +3523,27 @@ SaAisErrorT saCkptCheckpointWrite(SaCkpt >>>>>> /* Unlock cpa_lock before calling mds api */ >>>>>> m_NCS_UNLOCK(&cb->cb_lock, NCS_LOCK_WRITE); >>>>>> -#if 1 >>>>>> - for (iter = 0; iter < numberOfElements; iter++) >>>>>> + >>>>>> + for (iter = 0; iter < numberOfElements; iter++) { >>>>>> all_ioVector_size += ioVector[iter].dataSize; >>>>>> - >>>>>> - proc_rc = cpa_mds_msg_sync_send(cb->cpa_mds_hdl, >>>>>> &(gc_node->active_mds_dest), >>>>>> - &evt, &out_evt, >>>>>> CPA_WAIT_TIME(all_ioVector_size)); >>>>>> -#else >>>>>> - proc_rc = cpa_mds_msg_sync_send(cb->cpa_mds_hdl, >>>>>> &(gc_node->active_mds_dest), &evt, &out_evt, CPSV_WAIT_TIME); >>>>>> -#endif >>>>>> - /* Generate rc from proc_rc */ >>>>>> - switch (proc_rc) >>>>>> - { >>>>>> + time_out = CPA_WAIT_TIME(all_ioVector_size); >>>>>> + >>>>>> + if (time_out < 0 ) { >>>>>> + rc = SA_AIS_ERR_TIMEOUT; >>>>>> + TRACE_4("cpa ckpointWrite Api failed because of larg >>>>>> ioVector size return :%d,ckptHandle:%llx", >>>>>> + rc, checkpointHandle); >>>>>> + goto done; >>>>>> + } >>>>>> + else if (time_out < CPSV_WAIT_TIME) { >>>>>> + time_out = CPSV_WAIT_TIME; >>>>>> + } >>>>>> + proc_rc = cpa_mds_msg_sync_send(cb->cpa_mds_hdl, >>>>>> &(gc_node->active_mds_dest), >>>>>> + &evt, &out_evt, time_out); >>>>>> + >>>>>> + } >>>>>> + /* Generate rc from proc_rc */ >>>>>> + switch (proc_rc) >>>>>> + { >>>>>> case NCSCC_RC_SUCCESS: >>>>>> break; >>>>>> case NCSCC_RC_REQ_TIMOUT: >>>>>> @@ -3577,6 +3630,7 @@ SaAisErrorT saCkptSectionOverwrite(SaCkp >>>>>> CPA_GLOBAL_CKPT_NODE *gc_node = NULL; >>>>>> bool add_flag = false; >>>>>> CPA_CLIENT_NODE *cl_node = NULL; >>>>>> + uint32_t time_out; >>>>>> TRACE_ENTER2("SaCkptCheckpointHandleT passed is >>>>>> %llx",checkpointHandle); >>>>>> memset(&ckpt_data, '\0', sizeof(CPSV_CKPT_DATA)); >>>>>> @@ -3684,8 +3738,19 @@ SaAisErrorT saCkptSectionOverwrite(SaCkp >>>>>> /* Unlock cpa_lock before calling mds api */ >>>>>> m_NCS_UNLOCK(&cb->cb_lock, NCS_LOCK_WRITE); >>>>>> + time_out = CPA_WAIT_TIME(dataSize); >>>>>> + >>>>>> + if (time_out < 0 ) { >>>>>> + rc = SA_AIS_ERR_TIMEOUT; >>>>>> + TRACE_4("cpa SectionOverwrite Api failed because of larg >>>>>> dataSize return :%d,ckptHandle:%llx", rc, checkpointHandle); >>>>>> + goto done; >>>>>> + } >>>>>> + else if (time_out < CPSV_WAIT_TIME) { >>>>>> + time_out = CPSV_WAIT_TIME; >>>>>> + } >>>>>> + >>>>>> proc_rc = cpa_mds_msg_sync_send(cb->cpa_mds_hdl, >>>>>> &(gc_node->active_mds_dest), >>>>>> - &evt, &out_evt, CPA_WAIT_TIME(dataSize)); >>>>>> + &evt, &out_evt, time_out); >>>>>> /* Generate rc from proc_rc */ >>>>>> switch (proc_rc) { >>>>>> @@ -4053,6 +4118,12 @@ SaAisErrorT saCkptCheckpointSynchronize( >>>>>> /* For Ckpt Sync CPA will not perform any operation >>>>>> other than passing the request to CPND */ >>>>>> + if (timeout < 0 ) { >>>>>> + TRACE_4("Cpa CheckpointSynchronize: failed with invalied >>>>>> timeout return value:%d,ckptHandle:%llx", >>>>>> SA_AIS_ERR_INVALID_PARAM, checkpointHandle); >>>>>> + rc =SA_AIS_ERR_INVALID_PARAM; >>>>>> + goto done; >>>>>> + } >>>>>> + >>>>>> /* retrieve CPA CB */ >>>>>> m_CPA_RETRIEVE_CB(cb); >>>>>> if (!cb) { >>>>>> @@ -4139,7 +4210,7 @@ SaAisErrorT saCkptCheckpointSynchronize( >>>>>> /* Convert the time from saTimeT to millisecs */ >>>>>> timeout = m_CPSV_CONVERT_SATIME_TEN_MILLI_SEC(timeout); >>>>>> - if (timeout < NCS_SAF_MIN_ACCEPT_TIME) { >>>>>> + if (timeout < CPSV_WAIT_TIME) { >>>>>> rc = SA_AIS_ERR_TIMEOUT; >>>>>> TRACE_4("cpa CkptSynchronize Api failed with return >>>>>> value:%d,ckptHandle:%llx ", rc, checkpointHandle); >>>>>> goto fail1; >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Opensaf-devel mailing list >>>>>> Opensaf-devel@lists.sourceforge.net >>>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel >>>>>> >>>>> >>>> >>> >> > ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel