Hi Canh,

I have read my comment and I can see that it is not very clear, so I write this 
to clarify! I have still not looked at the code.

All try again loops shall have a timeout also during startup of a service. Log 
service responding with try again shall not stop NTF from starting.
It shall be avoided to do any initialization involving try again loops inside a 
callback function since it may delay returning and by that affect external 
timing.
A solution could be to not initialize the log service during start up and in 
any callback function (is this happening in AMF callback when for example doing 
a switch over?)
I also assume that no log records has to be written during start up or in any 
callback. If this is correct a solution for handling log service initialization 
could be to not initialize the log service during start up. Instead, if log 
service returns BAD HANDLE when trying to write a log record it is not 
initialized (the handle check is done in the log service API and is done just 
as fast as if a flag in NTF is used for this). If this happen just initialize 
and then redo the write. This should be handled internally in a NTF function or 
class handling writing of log records.
Such a solution means that the log service will be initialized the first time 
NTF needs to write a log record and if for some reason BAD HANDLE is returned 
in some other time.
Also it should not matter if the log service is not finalized when NTF becomes 
standby meaning that both standby and active NTF may be log service clients. 
Standby does not log anything.

Regards
Lennart

> -----Original Message-----
> From: Lennart Lund
> Sent: den 3 augusti 2018 14:19
> To: Minh Hon Chau <minh.c...@dektech.com.au>; Canh Van Truong
> <canh.v.tru...@dektech.com.au>; Vu Minh Nguyen
> <vu.m.ngu...@dektech.com.au>
> Cc: opensaf-devel@lists.sourceforge.net; Lennart Lund
> <lennart.l...@ericsson.com>
> Subject: RE: [devel] [PATCH 1/1] ntf: Update timeout for initializing log 
> client
> [#2878]
> 
> Hi Canh
> 
> I happened to see this
> 
> It is always correct to have a timed try again loop. If initializing the log 
> service
> time out in the AMF callback may not a big issue. If this happen NTF will get 
> a
> BAD_HANDLE reply when trying to write a log record. If this happen, just
> initialize the log service and then write the log record (initializing in 
> this case
> means initializing and open the alarm stream, only alarms are logged). If
> initializing fail at this time it may be considered a bit more serious. The
> recovery in this case may be a node restart initiated by an abort. I assume
> writing log records is not done in any AMF callback function it does not
> matter if the try again loop has a bit longer timeout time than it can have in
> an AMF callback function. I think initializing the log service in AMF callback
> shall be removed. It is good enough if it is done the first time writing a log
> record is requested and if the log handle becomes invalid for some reason.
> I haven't checked the code yet but I will take a look
> 
> Regards
> Lennart
> 
> > -----Original Message-----
> > From: Minh Hon Chau <minh.c...@dektech.com.au>
> > Sent: den 3 augusti 2018 09:32
> > To: Canh Van Truong <canh.v.tru...@dektech.com.au>; Vu Minh Nguyen
> > <vu.m.ngu...@dektech.com.au>
> > Cc: opensaf-devel@lists.sourceforge.net
> > Subject: Re: [devel] [PATCH 1/1] ntf: Update timeout for initializing log
> client
> > [#2878]
> >
> > Hi Canh,
> >
> > This issue seems to be partially fixed, we now prioritize to respond to
> > AMF callback. But if LOG service is still not responsive, NTF will run
> > without logging capability. I think we can address the second issue in
> > another ticket.
> >
> > Ack from me for code review, one comment in line.
> >
> > Thanks
> >
> > Minh
> >
> >
> > On 26/06/18 16:53, Canh Van Truong wrote:
> > > When NTFD is started, the NtfAdmin class object is defined and log client
> is
> > > initialized in this object. The initialiation of client may be loop 
> > > forever
> > > if TRY_AGAIN error always return. Somehow log service has not been
> > started on
> > > time or log service is busy, NTFD is kept in the loop. And the AMF hasn't
> > > received csi callback.
> > >
> > > This patch update the timeout when initializing log client.
> > > ---
> > >  src/ntf/ntfd/NtfLogger.cc | 165 +++++++++++++++++++++++-------------
> --
> > --------
> > >  src/ntf/ntfd/NtfLogger.h  |   4 +-
> > >  2 files changed, 85 insertions(+), 84 deletions(-)
> > >
> > > diff --git a/src/ntf/ntfd/NtfLogger.cc b/src/ntf/ntfd/NtfLogger.cc
> > > index fd17c58a2..9878824c3 100644
> > > --- a/src/ntf/ntfd/NtfLogger.cc
> > > +++ b/src/ntf/ntfd/NtfLogger.cc
> > > @@ -20,17 +20,18 @@
> > >   *   INCLUDE FILES
> > >   *
> >
> ==========================================================
> > ==============
> > >   */
> > > -#include <sys/poll.h>
> > > +#include "ntf/ntfd/NtfLogger.h"
> > >
> > > -#include "base/osaf_utility.h"
> > > +#include <sys/poll.h>
> > >  #include <saAis.h>
> > >  #include <saLog.h>
> > > -#include "ntf/ntfd/NtfAdmin.h"
> > > -#include "ntf/ntfd/NtfLogger.h"
> > >  #include "ntfs_com.h"
> > > -#include "base/logtrace.h"
> > >  #include "ntf/common/ntfsv_mem.h"
> > > +#include "ntf/ntfd/NtfAdmin.h"
> > > +#include "base/logtrace.h"
> > >  #include "base/osaf_extended_name.h"
> > > +#include "base/osaf_utility.h"
> > > +#include "base/osaf_time.h"
> > >
> > >  /*
> >
> ==========================================================
> > ==============
> > >   *   DEFINITIONS
> > > @@ -38,7 +39,6 @@
> > >   */
> > >  #define LOG_OPEN_TIMEOUT SA_TIME_ONE_SECOND
> > >  #define LOG_NOTIFICATION_TIMEOUT SA_TIME_ONE_SECOND
> > > -#define AIS_TIMEOUT 500000
> > >
> > >  /*
> >
> ==========================================================
> > ==============
> > >   *   TYPE DEFINITIONS
> > > @@ -65,15 +65,15 @@ void saLogWriteLogCallback(SaInvocationT
> > invocation, SaAisErrorT error);
> > >   */
> > >
> > >  static SaLogCallbacksT logCallbacks = {NULL, NULL,
> saLogWriteLogCallback};
> > > -
> > >  static SaLogHandleT logHandle;
> > >  static SaLogStreamHandleT alarmStreamHandle;
> > > +static bool log_client_initialized = false;
> > >  const SaVersionT kLogVersion = {'A', 2, 3};
> > > +const uint64_t kWaitTime = 8 * 1000; // Wait for timeout is 8 seconds
> > >
> > >  NtfLogger::NtfLogger() : readCounter(0) {
> > >    if (SA_AIS_OK != initLog()) {
> > > -    LOG_ER("initialize saflog failed exiting...");
> > > -    exit(EXIT_FAILURE);
> > > +    LOG_WA("Initialize saflog failed");
> > >    }
> > >  }
> > [Minh]: I think the exit() was there if initLog() fails, so that NID/AMF
> > can restart the NTF service. We can remove it unless NTF has completed
> > recycling LOG handle from BAD_HANDLE/TIMEOUT/...
> > >
> > > @@ -97,6 +97,8 @@ void saLogWriteLogCallback(SaInvocationT
> invocation,
> > SaAisErrorT error) {
> > >
> > >      TRACE_1("Error when logging (%d), queue for relogging", error);
> > >
> > > +    if (error == SA_AIS_ERR_BAD_HANDLE) log_client_initialized = false;
> > > +
> > >      notification = NtfAdmin::theNtfAdmin->getNotificationById(
> > >          (SaNtfIdentifierT)invocation);
> > >
> > > @@ -151,6 +153,15 @@ void NtfLogger::queueNotifcation(NtfSmartPtr&
> > notif) {
> > >
> > >  void NtfLogger::checkQueueAndLog(NtfSmartPtr& newNotif) {
> > >    TRACE_ENTER();
> > > +
> > > +  // Initialize client if it hasn't
> > > +  if (log_client_initialized == false) {
> > > +    if (SA_AIS_OK != initLog()) {
> > > +      queueNotifcation(newNotif);
> > > +      return;
> > > +    }
> > > +  }
> > > +
> > >    /* Check if there are not logged notifications in queue */
> > >    while (!queuedNotificationList.empty()) {
> > >      NtfSmartPtr notification = queuedNotificationList.front();
> > > @@ -174,7 +185,7 @@ void
> NtfLogger::checkQueueAndLog(NtfSmartPtr&
> > newNotif) {
> > >
> > >  SaAisErrorT NtfLogger::logNotification(NtfSmartPtr& notif) {
> > >    /* Write to the log if we're the local node */
> > > -  SaAisErrorT errorCode = SA_AIS_OK;
> > > +  SaAisErrorT rc = SA_AIS_OK;
> > >    SaLogHeaderT logHeader;
> > >    char addTextBuf[MAX_ADDITIONAL_TEXT_LENGTH] = {0};
> > >    SaLogBufferT logBuffer;
> > > @@ -219,10 +230,10 @@ SaAisErrorT
> > NtfLogger::logNotification(NtfSmartPtr& notif) {
> > >    if ((notif->sendNotInfo_->notificationType == SA_NTF_TYPE_ALARM)
> ||
> > >        (notif->sendNotInfo_->notificationType ==
> > SA_NTF_TYPE_SECURITY_ALARM)) {
> > >      TRACE_2("Logging notification to alarm stream");
> > > -    errorCode =
> > > +    rc =
> > >          saLogWriteLogAsync(alarmStreamHandle, notif->getNotificationId(),
> > >                             SA_LOG_RECORD_WRITE_ACK, &logRecord);
> > > -    switch (errorCode) {
> > > +    switch (rc) {
> > >        case SA_AIS_OK:
> > >          break;
> > >
> > > @@ -230,107 +241,97 @@ SaAisErrorT
> > NtfLogger::logNotification(NtfSmartPtr& notif) {
> > >        case SA_AIS_ERR_TRY_AGAIN:
> > >        case SA_AIS_ERR_TIMEOUT:
> > >          TRACE("Failed to log notification (ret: %d). Try next time.",
> > > -              errorCode);
> > > +              rc);
> > > +        break;
> > > +      case SA_AIS_ERR_BAD_HANDLE:
> > > +        TRACE("BAD HANDLE. Log client should be re-initialized");
> > > +        log_client_initialized = false;
> > >          break;
> > > -
> > >        default:
> > > -        osaf_abort(errorCode);
> > > +        LOG_ER("Writing failed: rc = %d", rc);
> > > +        osaf_abort(rc);
> > >      }
> > >    }
> > >
> > >    TRACE_LEAVE();
> > > -  return errorCode;
> > > +  return rc;
> > >  }
> > >
> > >  SaAisErrorT NtfLogger::initLog() {
> > > -  SaAisErrorT result;
> > > +  SaAisErrorT rc = SA_AIS_OK;
> > >    SaNameT alarmStreamName;
> > > -  osaf_extended_name_lend(SA_LOG_STREAM_ALARM,
> > &alarmStreamName);
> > > -  int first_try = 1;
> > >    SaVersionT log_version = kLogVersion;
> > > +  struct timespec timeout_time;
> > >
> > >    TRACE_ENTER();
> > >
> > > -  /* Initialize the Log service */
> > > -  do {
> > > -    result = saLogInitialize(&logHandle, &logCallbacks, &log_version);
> > > -    if (SA_AIS_ERR_TRY_AGAIN == result) {
> > > -      if (first_try) {
> > > -        LOG_WA("saLogInitialize returns try again, retries...");
> > > -        first_try = 0;
> > > -      }
> > > -      usleep(AIS_TIMEOUT);
> > > -      log_version = kLogVersion;
> > > -    }
> > > -  } while (SA_AIS_ERR_TRY_AGAIN == result);
> > > +  osaf_extended_name_lend(SA_LOG_STREAM_ALARM,
> > &alarmStreamName);
> > >
> > > -  if (SA_AIS_OK != result) {
> > > -    LOG_ER("Log initialize result is %d", result);
> > > -    goto exit_point;
> > > -  }
> > > -  if (!first_try) {
> > > -    LOG_IN("saLogInitialize ok");
> > > -    first_try = 1;
> > > +  // Initialize the Log service
> > > +  osaf_set_millis_timeout(kWaitTime, &timeout_time);
> > > +  while (!osaf_is_timeout(&timeout_time)) {
> > > +    rc = saLogInitialize(&logHandle, &logCallbacks, &log_version);
> > > +    if (rc != SA_AIS_ERR_TRY_AGAIN) break;
> > > +    log_version = kLogVersion;
> > > +    osaf_nanosleep(&kHundredMilliseconds);
> > >    }
> > >
> > > -  /* Get file descriptor to use in select */
> > > -  do {
> > > -    result = saLogSelectionObjectGet(logHandle, &ntfs_cb-
> > >logSelectionObject);
> > > -    if (SA_AIS_ERR_TRY_AGAIN == result) {
> > > -      if (first_try) {
> > > -        LOG_WA("saLogSelectionObjectGet returns try again, retries...");
> > > -        first_try = 0;
> > > -      }
> > > -      usleep(AIS_TIMEOUT);
> > > -    }
> > > -  } while (SA_AIS_ERR_TRY_AGAIN == result);
> > > +  if (rc != SA_AIS_OK) {
> > > +    LOG_WA("Initialization of log client failed rc = %d", rc);
> > > +    return rc;
> > > +  } else {
> > > +    log_client_initialized = true;
> > > +  }
> > >
> > > -  if (SA_AIS_OK != result) {
> > > -    LOG_ER("Log SelectionObjectGet result is %d", result);
> > > -    goto exit_point;
> > > +  // Get file descriptor to use in select
> > > +  osaf_set_millis_timeout(kWaitTime, &timeout_time);
> > > +  while (!osaf_is_timeout(&timeout_time)) {
> > > +    rc = saLogSelectionObjectGet(logHandle, &ntfs_cb-
> > >logSelectionObject);
> > > +    if (rc != SA_AIS_ERR_TRY_AGAIN) break;
> > > +    osaf_nanosleep(&kHundredMilliseconds);
> > >    }
> > >
> > > -  if (SA_AIS_OK != result) {
> > > -    LOG_ER("Failed to open the notification log stream (%d)", result);
> > > +  if (rc != SA_AIS_OK) {
> > > +    LOG_WA("Log SelectionObjectGet failed rc = %d", rc);
> > >      goto exit_point;
> > >    }
> > > -  if (!first_try) {
> > > -    LOG_IN("saLogSelectionObjectGet ok");
> > > -    first_try = 1;
> > > -  }
> > >
> > > -  /* Open the alarm stream */
> > > -  do {
> > > -    result = saLogStreamOpen_2(logHandle, &alarmStreamName, NULL, 0,
> > > -                               LOG_OPEN_TIMEOUT, &alarmStreamHandle);
> > > -    if (SA_AIS_ERR_TRY_AGAIN == result) {
> > > -      if (first_try) {
> > > -        LOG_WA("saLogStreamOpen_2 returns try again, retries...");
> > > -        first_try = 0;
> > > -      }
> > > -      usleep(AIS_TIMEOUT);
> > > -    }
> > > -  } while (SA_AIS_ERR_TRY_AGAIN == result);
> > > -
> > > -  if (SA_AIS_OK != result) {
> > > -    LOG_ER("Failed to open the alarm log stream (%d)", result);
> > > -    goto exit_point;
> > > +  // Open the alarm stream
> > > +  osaf_set_millis_timeout(kWaitTime, &timeout_time);
> > > +  while (!osaf_is_timeout(&timeout_time)) {
> > > +    rc = saLogStreamOpen_2(logHandle, &alarmStreamName, NULL, 0,
> > > +                           LOG_OPEN_TIMEOUT, &alarmStreamHandle);
> > > +    if (rc != SA_AIS_ERR_TRY_AGAIN) break;
> > > +    osaf_nanosleep(&kHundredMilliseconds);
> > >    }
> > > -  if (!first_try) {
> > > -    LOG_IN("saLogStreamOpen_2 ok");
> > > -    first_try = 1;
> > > +
> > > +  if (rc != SA_AIS_OK) {
> > > +    LOG_WA("Log saLogStreamOpen_2 failed rc = %d", rc);
> > >    }
> > >
> > >  exit_point:
> > > +  if (rc != SA_AIS_OK) {
> > > +    // Finalize client
> > > +    osaf_set_millis_timeout(kWaitTime/2, &timeout_time);
> > > +    while (!osaf_is_timeout(&timeout_time)) {
> > > +      SaAisErrorT ret = saLogFinalize(logHandle);
> > > +      if (ret != SA_AIS_ERR_TRY_AGAIN) break;
> > > +      osaf_nanosleep(&kHundredMilliseconds);
> > > +    }
> > > +    log_client_initialized = false;
> > > +  }
> > >    TRACE_LEAVE();
> > > -  return (result);
> > > +  return rc;
> > >  }
> > >
> > >  void logEvent() {
> > > -  SaAisErrorT errorCode;
> > > -  errorCode = saLogDispatch(logHandle, SA_DISPATCH_ALL);
> > > -  if (SA_AIS_OK != errorCode) {
> > > -    TRACE_1("Failed to dispatch log events (%d)", errorCode);
> > > +  SaAisErrorT rc;
> > > +  rc = saLogDispatch(logHandle, SA_DISPATCH_ALL);
> > > +  if (rc != SA_AIS_OK) {
> > > +    LOG_WA("Fail to dispatch log events rc = %d", rc);
> > > +    if (rc == SA_AIS_ERR_BAD_HANDLE) {
> > > +      log_client_initialized = false;
> > > +    }
> > >    }
> > >    return;
> > >  }
> > > diff --git a/src/ntf/ntfd/NtfLogger.h b/src/ntf/ntfd/NtfLogger.h
> > > index cc0ac7dba..739b97511 100644
> > > --- a/src/ntf/ntfd/NtfLogger.h
> > > +++ b/src/ntf/ntfd/NtfLogger.h
> > > @@ -56,14 +56,14 @@ class NtfLogger {
> > >    //    virtual ~NtfLogger();
> > >
> > >    void log(NtfSmartPtr& notif, bool isLocal);
> > > -  void checkQueueAndLog(NtfSmartPtr& notif);
> > > -  SaAisErrorT logNotification(NtfSmartPtr& notif);
> > >    void queueNotifcation(NtfSmartPtr& notif);
> > >    void printInfo();
> > >    void syncRequest(NCS_UBAID *uba);
> > >
> > >   private:
> > >    SaAisErrorT initLog();
> > > +  void checkQueueAndLog(NtfSmartPtr& notif);
> > > +  SaAisErrorT logNotification(NtfSmartPtr& notif);
> > >
> > >    readerNotificationListT coll_;
> > >    unsigned int readCounter;
> >
> >
> >
> > ------------------------------------------------------------------------------
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > _______________________________________________
> > Opensaf-devel mailing list
> > Opensaf-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/opensaf-devel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to