Hi Anders Widell, Please find my answers.
On 12/22/2015 10:41 PM, Anders Widell wrote: > See comments inline. > > regards, > Anders Widell > > On 12/21/2015 11:04 AM, A V Mahesh wrote: >> Hi, >> >> - The timeout here is the duration for which the API should wait >> before returning to the user. >> - The patch does not changes the minimum value of 14 seconds. IE. >> Before this patch also the default value was 14 seconds. >> This value was set based on implementation specific fine tuning, >> particularly in large size cluster. > I think it does change the minimum timeout. I am here referring to the > functions that take an SaTimeT timeout parameter. You can try this > example program, that used to work before ticket [#1583] was pushed: > > const SaNameT checkpointName = {strlen("safCkpt=hello"), > "safCkpt=hello"}; > do { > err = saCkptCheckpointOpen( > ckptHandle, > &checkpointName, > &checkpointCreationAttributes, > SA_CKPT_CHECKPOINT_READ | > SA_CKPT_CHECKPOINT_WRITE | > SA_CKPT_CHECKPOINT_CREATE, > 10 * SA_TIME_ONE_SECOND, > &checkpointHandle); > [AVM] Even though previously ( before this patch) Application was passing 10 * SA_TIME_ONE_SECOND ( ) because of #1583 reported real issue of `SaTimeT timeout parameter is not being converted in to TEN_MILLI_SEC() while passing to mds_msg_sync_send()` internally CPA was assuming Application is passed 10 SECOND * 10000000 ( 10 * 1000000000 * 10000000) ,which is being treated as larger than the default timeout time of 14 SECOND, so previously the Application only has an impression that it will get time out in 10 SECOND , but in piratical it was considered larger than 14 SECOND default time. ================================================================================================================= Before #1583 Fix : saCkptCheckpointOpen( ckptHandle, &checkpointName, ....., 10 * SA_TIME_ONE_SECOND, &checkpointHandle); timeout = 10 * SA_TIME_ONE_SECOND; proc_rc = cpa_mds_msg_sync_send(cb->cpa_mds_hdl, &(cb->cpnd_mds_dest), &evt, &out_evt, timeout); After #1583 Fix : timeout = 10 * SA_TIME_ONE_SECOND; time_out = m_CPSV_CONVERT_SATIME_TEN_MILLI_SEC(timeout); proc_rc = cpa_mds_msg_sync_send(cb->cpa_mds_hdl, &(cb->cpnd_mds_dest), &evt, &out_evt, time_out); ================================================================================================================= In other words , before this patch in reality the saCkptCheckpointOpen() was not being returned TIMEOUT even after 10 SECOND ( but Application expecting it). Application was only in assumption it was passing 10 SECOND , so no change in behavioral difference in Application before and after this patch. > >> - Even though it is not a practical case of passing 49 days (greater >> than or equal to 0x98968000000000) as a timeout value, yes what you >> say can be done across all services. > Shouldn't we then document 0x98968000000000 as the upper limit for the > timeout parameter, and validate that the parameter given by the user > is less than this upper limit. As it is coded right now, the function > will unconditionally return SA_AIS_ERR_TIMEOUT if you specify > 0x98968000000000 as timeout (you can try this by replacing 10 * > SA_TIME_ONE_SECOND with 0x98968000000000 in the example program above). [AVM] Ok ,as if this issue exist with all service , I will raise a new ticket and will be fixed/document for all services point of view . -AVM >> >> Thanks, >> AV. >> >> >> On 12/18/2015 2:43 PM, Anders Widell wrote: >>> Hi! >>> >>> I have some questions regarding this change: >>> >>> 1) This change is not backwards compatible. Have you considered to >>> either bump the minor version of the CKPT API, or to use ELF symbol >>> versioning, to avoid breaking old applications? >>> >>> 2) Why was the minimum possible time-out increased from 0.1 seconds >>> to 14 seconds? 14 seconds is like an eternity in a real-time system. >>> >>> 3) If the user-specified timeout is less than 14 seconds, the >>> functions will always fail without even trying. Wouldn't it be >>> better to at least try to perform the operation within the time >>> limit specified by the user? The user may have a loop where it keeps >>> retrying the API function call when the SA_AIS_ERR_TIMEOUT error >>> code is returned. I would expect such a loop to eventually >>> terminate, especially if the timeout parameter has a reasonable >>> value (and there are many reasonable timeout values which are less >>> than 14 seconds). >>> >>> 3) Can't the conversion from nanoseconds represented as a signed >>> 64-bit integer to a unit of 10 milliseconds stored in an unsigned >>> 32-bit integer ('uint32_t time_out' in the code below) overflow? >>> I.e. if the timeout parameter has a value greater than or equal to >>> 0x98968000000000? >>> >>> / Anders Widell >>> >>> On 11/06/2015 06:33 AM, mahesh.va...@oracle.com wrote: >>>> osaf/libs/agents/saf/cpa/cpa_api.c | 111 >>>> ++++++++++++++++++++++++++++++------ >>>> 1 files changed, 91 insertions(+), 20 deletions(-) >>>> >>>> >>>> If we test upper and lower limits of SaTimeT timeout arguments to >>>> CPSV API , >>>> some of CPA APIs are doesn't returns or hungs. >>>> >>>> Current cpa code doesn't validating or passing SaTimeT timeout >>>> arguments to CPSV API , >>>> which are below default API timeout limit of CPSV_WAIT_TIME (1400 X >>>> 10ms) and >>>> above SaTimeT, type long long int Maximum value 9223372036854775807 >>>> (2^63-1) >>>> >>>> For example : >>>> >>>> i) const SaTimeT timeout = 1215752192; >>>> ii) const SaTimeT timeout = 2 * SA_TIME_ONE_MINUTE; >>>> iii) const SaTimeT timeout = 9223372036854775809; >>>> >>>> saCkptCheckpointOpen(...., timeout, ...); >>>> >>>> Because of that in some of variables are over flowing and , API >>>> calls are not working properly >>>> >>>> Fix : >>>> >>>> 1 - Validated if (time_out < CPSV_WAIT_TIME) return >>>> SA_AIS_ERR_INVALID_PARAM >>>> 2 - Validates if (expirationTime < 0 ) return >>>> SA_AIS_ERR_INVALID_PARAM >>>> 3 - Pass proper timeout values to internal functions of Opensaf >>>> >>>> diff --git a/osaf/libs/agents/saf/cpa/cpa_api.c >>>> b/osaf/libs/agents/saf/cpa/cpa_api.c >>>> --- a/osaf/libs/agents/saf/cpa/cpa_api.c >>>> +++ b/osaf/libs/agents/saf/cpa/cpa_api.c >>>> @@ -27,7 +27,6 @@ >>>> ******************************************************************************/ >>>> >>>> >>>> #include "cpa.h" >>>> -#define NCS_SAF_MIN_ACCEPT_TIME 10 >>>> /**************************************************************************** >>>> >>>> >>>> Name : SaCkptInitialize >>>> @@ -889,6 +888,14 @@ SaAisErrorT saCkptCheckpointOpen(SaCkptH >>>> return SA_AIS_ERR_INVALID_PARAM; >>>> } >>>> + if (timeout < 0 ) { >>>> + TRACE_4("Cpa CkptOpen: failed invalid timeout return >>>> value:%d,ckptHandle:%llx", >>>> + SA_AIS_ERR_INVALID_PARAM, ckptHandle); >>>> + TRACE_LEAVE2("API return code = %u", rc); >>>> + return SA_AIS_ERR_INVALID_PARAM; >>>> + } >>>> + >>>> + >>>> /* Draft Validations */ >>>> rc = cpa_open_attr_validate(checkpointCreationAttributes, >>>> checkpointOpenFlags, checkpointName); >>>> if (rc != SA_AIS_OK) { >>>> @@ -975,13 +982,13 @@ SaAisErrorT saCkptCheckpointOpen(SaCkptH >>>> } >>>> evt.info.cpnd.info.openReq.ckpt_flags = checkpointOpenFlags; >>>> - >>>> /* convert the timeout to 10 ms value and add it to the sync >>>> send timeout */ >>>> time_out = m_CPSV_CONVERT_SATIME_TEN_MILLI_SEC(timeout); >>>> - if (time_out < NCS_SAF_MIN_ACCEPT_TIME) { >>>> + if (time_out < CPSV_WAIT_TIME) { >>>> rc = SA_AIS_ERR_TIMEOUT; >>>> - TRACE_4("Cpa CkptOpen failed Api failed with return >>>> value:%d,ckptHandle:%llx,timeout:%llu", rc, ckptHandle, timeout); >>>> + TRACE_4("Cpa CkptOpen failed Api failed with return >>>> value:%d,ckptHandle:%llx,timeout:%llu", >>>> + rc, ckptHandle, timeout); >>>> goto mds_send_fail; >>>> } >>>> @@ -998,9 +1005,8 @@ SaAisErrorT saCkptCheckpointOpen(SaCkptH >>>> TRACE_4("cpa CkptOpen:CPND_DOWN Api failed with return >>>> value:%d,ckptHandle:%llx", rc, ckptHandle); >>>> goto mds_send_fail; >>>> } >>>> - >>>> /* Send the evt to CPND */ >>>> - proc_rc = cpa_mds_msg_sync_send(cb->cpa_mds_hdl, >>>> &(cb->cpnd_mds_dest), &evt, &out_evt, timeout); >>>> + proc_rc = cpa_mds_msg_sync_send(cb->cpa_mds_hdl, >>>> &(cb->cpnd_mds_dest), &evt, &out_evt, time_out); >>>> /* Generate rc from proc_rc */ >>>> switch (proc_rc) { >>>> @@ -1709,6 +1715,13 @@ SaAisErrorT saCkptCheckpointRetentionDur >>>> goto done; >>>> } >>>> + if (retentionDuration < 0 ) { >>>> + TRACE_4("Cpa RetentionDurationSet: failed invalid >>>> retentionDuration return value:%d,ckptHandle:%llx", >>>> + SA_AIS_ERR_INVALID_PARAM, checkpointHandle); >>>> + rc = SA_AIS_ERR_INVALID_PARAM; >>>> + goto done; >>>> + } >>>> + >>>> /* get the CB Lock */ >>>> if (m_NCS_LOCK(&cb->cb_lock, NCS_LOCK_WRITE) != >>>> NCSCC_RC_SUCCESS) { >>>> rc = SA_AIS_ERR_LIBRARY; >>>> @@ -2183,6 +2196,7 @@ SaAisErrorT saCkptSectionCreate(SaCkptCh >>>> bool gen_sec_flag = false; >>>> SaCkptSectionIdT app_ptr; >>>> CPA_CLIENT_NODE *cl_node = NULL; >>>> + uint32_t time_out; >>>> TRACE_ENTER2("SaCkptCheckpointHandleT passed is >>>> %llx",checkpointHandle); >>>> /* Validate the Input Parameters */ >>>> @@ -2312,8 +2326,21 @@ SaAisErrorT saCkptSectionCreate(SaCkptCh >>>> evt.info.cpnd.info.sec_creatReq.init_size = initialDataSize; >>>> m_NCS_UNLOCK(&cb->cb_lock, NCS_LOCK_WRITE); >>>> + >>>> + time_out = CPA_WAIT_TIME(initialDataSize); >>>> + >>>> + if (time_out < 0 ) { >>>> + rc = SA_AIS_ERR_TIMEOUT; >>>> + TRACE_4("cpa SectCreate Api failed because of larg >>>> initialDataSize return :%d,ckptHandle:%llx", >>>> + rc, checkpointHandle); >>>> + goto fail1; >>>> + } >>>> + else if (time_out < CPSV_WAIT_TIME) { >>>> + time_out = CPSV_WAIT_TIME; >>>> + } >>>> + >>>> proc_rc = cpa_mds_msg_sync_send(cb->cpa_mds_hdl, >>>> &(gc_node->active_mds_dest), >>>> - &evt, &out_evt, CPA_WAIT_TIME(initialDataSize)); >>>> + &evt, &out_evt, time_out); >>>> /* Generate rc from proc_rc */ >>>> switch (proc_rc) { >>>> @@ -2684,6 +2711,14 @@ SaAisErrorT saCkptSectionExpirationTimeS >>>> return rc; >>>> } >>>> + if (expirationTime < 0 ) { >>>> + TRACE_4("Cpa ExpirationTimeSet: failed invalid >>>> expirationTime return value:%d,ckptHandle:%llx", >>>> + SA_AIS_ERR_INVALID_PARAM, checkpointHandle); >>>> + rc = SA_AIS_ERR_INVALID_PARAM; >>>> + TRACE_LEAVE2("API return code = %u", rc); >>>> + return rc; >>>> + } >>>> + >>>> /* retrieve CPA CB */ >>>> m_CPA_RETRIEVE_CB(cb); >>>> if (!cb) { >>>> @@ -2847,6 +2882,14 @@ SaAisErrorT saCkptSectionIterationInitia >>>> return rc; >>>> } >>>> + if (expirationTime < 0 ) { >>>> + TRACE_4("Cpa SectionIterationInitialize: failed invalid >>>> expirationTime return value:%d,ckptHandle:%llx", >>>> + SA_AIS_ERR_INVALID_PARAM, checkpointHandle); >>>> + rc = SA_AIS_ERR_INVALID_PARAM; >>>> + TRACE_LEAVE2("API return code = %u", rc); >>>> + return rc; >>>> + } >>>> + >>>> /* retrieve CPA CB */ >>>> m_CPA_RETRIEVE_CB(cb); >>>> if (!cb) { >>>> @@ -3360,6 +3403,7 @@ SaAisErrorT saCkptCheckpointWrite(SaCkpt >>>> CPA_CLIENT_NODE *cl_node = NULL; >>>> SaSizeT all_ioVector_size = 0; >>>> uint32_t err_flag = 0; >>>> + uint32_t time_out; >>>> >>>> TRACE_ENTER2("SaCkptCheckpointHandleT passed is >>>> %llx",checkpointHandle); >>>> @@ -3479,18 +3523,27 @@ SaAisErrorT saCkptCheckpointWrite(SaCkpt >>>> /* Unlock cpa_lock before calling mds api */ >>>> m_NCS_UNLOCK(&cb->cb_lock, NCS_LOCK_WRITE); >>>> -#if 1 >>>> - for (iter = 0; iter < numberOfElements; iter++) >>>> + >>>> + for (iter = 0; iter < numberOfElements; iter++) { >>>> all_ioVector_size += ioVector[iter].dataSize; >>>> - >>>> - proc_rc = cpa_mds_msg_sync_send(cb->cpa_mds_hdl, >>>> &(gc_node->active_mds_dest), >>>> - &evt, &out_evt, >>>> CPA_WAIT_TIME(all_ioVector_size)); >>>> -#else >>>> - proc_rc = cpa_mds_msg_sync_send(cb->cpa_mds_hdl, >>>> &(gc_node->active_mds_dest), &evt, &out_evt, CPSV_WAIT_TIME); >>>> -#endif >>>> - /* Generate rc from proc_rc */ >>>> - switch (proc_rc) >>>> - { >>>> + time_out = CPA_WAIT_TIME(all_ioVector_size); >>>> + >>>> + if (time_out < 0 ) { >>>> + rc = SA_AIS_ERR_TIMEOUT; >>>> + TRACE_4("cpa ckpointWrite Api failed because of larg >>>> ioVector size return :%d,ckptHandle:%llx", >>>> + rc, checkpointHandle); >>>> + goto done; >>>> + } >>>> + else if (time_out < CPSV_WAIT_TIME) { >>>> + time_out = CPSV_WAIT_TIME; >>>> + } >>>> + proc_rc = cpa_mds_msg_sync_send(cb->cpa_mds_hdl, >>>> &(gc_node->active_mds_dest), >>>> + &evt, &out_evt, time_out); >>>> + >>>> + } >>>> + /* Generate rc from proc_rc */ >>>> + switch (proc_rc) >>>> + { >>>> case NCSCC_RC_SUCCESS: >>>> break; >>>> case NCSCC_RC_REQ_TIMOUT: >>>> @@ -3577,6 +3630,7 @@ SaAisErrorT saCkptSectionOverwrite(SaCkp >>>> CPA_GLOBAL_CKPT_NODE *gc_node = NULL; >>>> bool add_flag = false; >>>> CPA_CLIENT_NODE *cl_node = NULL; >>>> + uint32_t time_out; >>>> TRACE_ENTER2("SaCkptCheckpointHandleT passed is >>>> %llx",checkpointHandle); >>>> memset(&ckpt_data, '\0', sizeof(CPSV_CKPT_DATA)); >>>> @@ -3684,8 +3738,19 @@ SaAisErrorT saCkptSectionOverwrite(SaCkp >>>> /* Unlock cpa_lock before calling mds api */ >>>> m_NCS_UNLOCK(&cb->cb_lock, NCS_LOCK_WRITE); >>>> + time_out = CPA_WAIT_TIME(dataSize); >>>> + >>>> + if (time_out < 0 ) { >>>> + rc = SA_AIS_ERR_TIMEOUT; >>>> + TRACE_4("cpa SectionOverwrite Api failed because of larg >>>> dataSize return >>>> :%d,ckptHandle:%llx", rc, >>>> checkpointHandle); >>>> + goto done; >>>> + } >>>> + else if (time_out < CPSV_WAIT_TIME) { >>>> + time_out = CPSV_WAIT_TIME; >>>> + } >>>> + >>>> proc_rc = cpa_mds_msg_sync_send(cb->cpa_mds_hdl, >>>> &(gc_node->active_mds_dest), >>>> - &evt, &out_evt, CPA_WAIT_TIME(dataSize)); >>>> + &evt, &out_evt, time_out); >>>> /* Generate rc from proc_rc */ >>>> switch (proc_rc) { >>>> @@ -4053,6 +4118,12 @@ SaAisErrorT saCkptCheckpointSynchronize( >>>> /* For Ckpt Sync CPA will not perform any operation >>>> other than passing the request to CPND */ >>>> + if (timeout < 0 ) { >>>> + TRACE_4("Cpa CheckpointSynchronize: failed with invalied >>>> timeout return value:%d,ckptHandle:%llx", SA_AIS_ERR_INVALID_PARAM, >>>> checkpointHandle); >>>> + rc =SA_AIS_ERR_INVALID_PARAM; >>>> + goto done; >>>> + } >>>> + >>>> /* retrieve CPA CB */ >>>> m_CPA_RETRIEVE_CB(cb); >>>> if (!cb) { >>>> @@ -4139,7 +4210,7 @@ SaAisErrorT saCkptCheckpointSynchronize( >>>> /* Convert the time from saTimeT to millisecs */ >>>> timeout = m_CPSV_CONVERT_SATIME_TEN_MILLI_SEC(timeout); >>>> - if (timeout < NCS_SAF_MIN_ACCEPT_TIME) { >>>> + if (timeout < CPSV_WAIT_TIME) { >>>> rc = SA_AIS_ERR_TIMEOUT; >>>> TRACE_4("cpa CkptSynchronize Api failed with return >>>> value:%d,ckptHandle:%llx ", rc, checkpointHandle); >>>> goto fail1; >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> >>>> _______________________________________________ >>>> Opensaf-devel mailing list >>>> Opensaf-devel@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel >>>> >>> >> > ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel