Hi Lennart,

Thanks for your comments.
For initLog() delay when ntf start, I am working it with creating new thread
to do it to avoid this issue. And I will also update your comments in new
version.

Regards
Canh



-----Original Message-----
From: Lennart Lund <lennart.l...@ericsson.com> 
Sent: Monday, August 13, 2018 6:39 PM
To: Minh Hon Chau <minh.c...@dektech.com.au>; Canh Van Truong
<canh.v.tru...@dektech.com.au>; Vu Minh Nguyen <vu.m.ngu...@dektech.com.au>
Cc: opensaf-devel@lists.sourceforge.net; Lennart Lund
<lennart.l...@ericsson.com>
Subject: RE: [devel] [PATCH 1/1] ntf: Update timeout for initializing log
client [#2878]

Hi Canh,

I'm not sure if my comment still is unclear so I have written comments in
the code to further explain how the changes I suggest below can be done.
As usual I have attached a diff that can be applied on the review branch.
Look for [Lennart]
Note that I have not tested anything and there may be things I have missed
but the principle should be valid
If you fix according to my suggestion the fix will be smaller and affect
less code also the log_client_initialized flag that causes unwanted
incorrect dependencies can be removed.

A general comment. I can see that almost no functions, classes etc. are
describe in any other way than by a that in some cases are somewhat
descriptive. However this is not good enough. In many cases a better name is
needed and in most cases for example functions do more ore something else
than the name says. Also return codes are incorrect in some cases.
I suggest that this is fixed, at least for functions that are affected by
other fixes. I have written a comment about that for one affected function
in the commented code

Thanks
Lennart

> -----Original Message-----
> From: Lennart Lund
> Sent: den 6 augusti 2018 10:44
> To: Minh Hon Chau <minh.c...@dektech.com.au>; Canh Van Truong
> <canh.v.tru...@dektech.com.au>; Vu Minh Nguyen
> <vu.m.ngu...@dektech.com.au>
> Cc: opensaf-devel@lists.sourceforge.net; Lennart Lund
> <lennart.l...@ericsson.com>
> Subject: RE: [devel] [PATCH 1/1] ntf: Update timeout for initializing log
client
> [#2878]
>
> Hi Canh,
>
> I have read my comment and I can see that it is not very clear, so I write
this
> to clarify! I have still not looked at the code.
>
> All try again loops shall have a timeout also during startup of a service.
Log
> service responding with try again shall not stop NTF from starting.
> It shall be avoided to do any initialization involving try again loops
inside a
> callback function since it may delay returning and by that affect external
> timing.
> A solution could be to not initialize the log service during start up and
in any
> callback function (is this happening in AMF callback when for example
doing a
> switch over?)
> I also assume that no log records has to be written during start up or in
any
> callback. If this is correct a solution for handling log service
initialization could
> be to not initialize the log service during start up. Instead, if log
service
> returns BAD HANDLE when trying to write a log record it is not initialized
(the
> handle check is done in the log service API and is done just as fast as if
a flag
> in NTF is used for this). If this happen just initialize and then redo the
write.
> This should be handled internally in a NTF function or class handling
writing of
> log records.
> Such a solution means that the log service will be initialized the first
time NTF
> needs to write a log record and if for some reason BAD HANDLE is returned
in
> some other time.
> Also it should not matter if the log service is not finalized when NTF
becomes
> standby meaning that both standby and active NTF may be log service
clients.
> Standby does not log anything.
>
> Regards
> Lennart
>
> > -----Original Message-----
> > From: Lennart Lund
> > Sent: den 3 augusti 2018 14:19
> > To: Minh Hon Chau <minh.c...@dektech.com.au>; Canh Van Truong
> > <canh.v.tru...@dektech.com.au>; Vu Minh Nguyen
> > <vu.m.ngu...@dektech.com.au>
> > Cc: opensaf-devel@lists.sourceforge.net; Lennart Lund
> > <lennart.l...@ericsson.com>
> > Subject: RE: [devel] [PATCH 1/1] ntf: Update timeout for initializing
log
> client
> > [#2878]
> >
> > Hi Canh
> >
> > I happened to see this
> >
> > It is always correct to have a timed try again loop. If initializing the
log
> service
> > time out in the AMF callback may not a big issue. If this happen NTF
will get
> a
> > BAD_HANDLE reply when trying to write a log record. If this happen, just
> > initialize the log service and then write the log record (initializing
in this case
> > means initializing and open the alarm stream, only alarms are logged).
If
> > initializing fail at this time it may be considered a bit more serious.
The
> > recovery in this case may be a node restart initiated by an abort. I
assume
> > writing log records is not done in any AMF callback function it does not
> > matter if the try again loop has a bit longer timeout time than it can
have in
> > an AMF callback function. I think initializing the log service in AMF
callback
> > shall be removed. It is good enough if it is done the first time writing
a log
> > record is requested and if the log handle becomes invalid for some
reason.
> > I haven't checked the code yet but I will take a look
> >
> > Regards
> > Lennart
> >
> > > -----Original Message-----
> > > From: Minh Hon Chau <minh.c...@dektech.com.au>
> > > Sent: den 3 augusti 2018 09:32
> > > To: Canh Van Truong <canh.v.tru...@dektech.com.au>; Vu Minh
> Nguyen
> > > <vu.m.ngu...@dektech.com.au>
> > > Cc: opensaf-devel@lists.sourceforge.net
> > > Subject: Re: [devel] [PATCH 1/1] ntf: Update timeout for initializing
log
> > client
> > > [#2878]
> > >
> > > Hi Canh,
> > >
> > > This issue seems to be partially fixed, we now prioritize to respond
to
> > > AMF callback. But if LOG service is still not responsive, NTF will run
> > > without logging capability. I think we can address the second issue in
> > > another ticket.
> > >
> > > Ack from me for code review, one comment in line.
> > >
> > > Thanks
> > >
> > > Minh
> > >
> > >
> > > On 26/06/18 16:53, Canh Van Truong wrote:
> > > > When NTFD is started, the NtfAdmin class object is defined and log
> client
> > is
> > > > initialized in this object. The initialiation of client may be loop
forever
> > > > if TRY_AGAIN error always return. Somehow log service has not been
> > > started on
> > > > time or log service is busy, NTFD is kept in the loop. And the AMF
hasn't
> > > > received csi callback.
> > > >
> > > > This patch update the timeout when initializing log client.
> > > > ---
> > > >  src/ntf/ntfd/NtfLogger.cc | 165 +++++++++++++++++++++++-----------
> --
> > --
> > > --------
> > > >  src/ntf/ntfd/NtfLogger.h  |   4 +-
> > > >  2 files changed, 85 insertions(+), 84 deletions(-)
> > > >
> > > > diff --git a/src/ntf/ntfd/NtfLogger.cc b/src/ntf/ntfd/NtfLogger.cc
> > > > index fd17c58a2..9878824c3 100644
> > > > --- a/src/ntf/ntfd/NtfLogger.cc
> > > > +++ b/src/ntf/ntfd/NtfLogger.cc
> > > > @@ -20,17 +20,18 @@
> > > >   *   INCLUDE FILES
> > > >   *
> > >
> >
> ==========================================================
> > > ==============
> > > >   */
> > > > -#include <sys/poll.h>
> > > > +#include "ntf/ntfd/NtfLogger.h"
> > > >
> > > > -#include "base/osaf_utility.h"
> > > > +#include <sys/poll.h>
> > > >  #include <saAis.h>
> > > >  #include <saLog.h>
> > > > -#include "ntf/ntfd/NtfAdmin.h"
> > > > -#include "ntf/ntfd/NtfLogger.h"
> > > >  #include "ntfs_com.h"
> > > > -#include "base/logtrace.h"
> > > >  #include "ntf/common/ntfsv_mem.h"
> > > > +#include "ntf/ntfd/NtfAdmin.h"
> > > > +#include "base/logtrace.h"
> > > >  #include "base/osaf_extended_name.h"
> > > > +#include "base/osaf_utility.h"
> > > > +#include "base/osaf_time.h"
> > > >
> > > >  /*
> > >
> >
> ==========================================================
> > > ==============
> > > >   *   DEFINITIONS
> > > > @@ -38,7 +39,6 @@
> > > >   */
> > > >  #define LOG_OPEN_TIMEOUT SA_TIME_ONE_SECOND
> > > >  #define LOG_NOTIFICATION_TIMEOUT SA_TIME_ONE_SECOND
> > > > -#define AIS_TIMEOUT 500000
> > > >
> > > >  /*
> > >
> >
> ==========================================================
> > > ==============
> > > >   *   TYPE DEFINITIONS
> > > > @@ -65,15 +65,15 @@ void saLogWriteLogCallback(SaInvocationT
> > > invocation, SaAisErrorT error);
> > > >   */
> > > >
> > > >  static SaLogCallbacksT logCallbacks = {NULL, NULL,
> > saLogWriteLogCallback};
> > > > -
> > > >  static SaLogHandleT logHandle;
> > > >  static SaLogStreamHandleT alarmStreamHandle;
> > > > +static bool log_client_initialized = false;
> > > >  const SaVersionT kLogVersion = {'A', 2, 3};
> > > > +const uint64_t kWaitTime = 8 * 1000; // Wait for timeout is 8
seconds
> > > >
> > > >  NtfLogger::NtfLogger() : readCounter(0) {
> > > >    if (SA_AIS_OK != initLog()) {
> > > > -    LOG_ER("initialize saflog failed exiting...");
> > > > -    exit(EXIT_FAILURE);
> > > > +    LOG_WA("Initialize saflog failed");
> > > >    }
> > > >  }
> > > [Minh]: I think the exit() was there if initLog() fails, so that
NID/AMF
> > > can restart the NTF service. We can remove it unless NTF has completed
> > > recycling LOG handle from BAD_HANDLE/TIMEOUT/...
> > > >
> > > > @@ -97,6 +97,8 @@ void saLogWriteLogCallback(SaInvocationT
> > invocation,
> > > SaAisErrorT error) {
> > > >
> > > >      TRACE_1("Error when logging (%d), queue for relogging", error);
> > > >
> > > > +    if (error == SA_AIS_ERR_BAD_HANDLE) log_client_initialized =
false;
> > > > +
> > > >      notification = NtfAdmin::theNtfAdmin->getNotificationById(
> > > >          (SaNtfIdentifierT)invocation);
> > > >
> > > > @@ -151,6 +153,15 @@ void
> NtfLogger::queueNotifcation(NtfSmartPtr&
> > > notif) {
> > > >
> > > >  void NtfLogger::checkQueueAndLog(NtfSmartPtr& newNotif) {
> > > >    TRACE_ENTER();
> > > > +
> > > > +  // Initialize client if it hasn't
> > > > +  if (log_client_initialized == false) {
> > > > +    if (SA_AIS_OK != initLog()) {
> > > > +      queueNotifcation(newNotif);
> > > > +      return;
> > > > +    }
> > > > +  }
> > > > +
> > > >    /* Check if there are not logged notifications in queue */
> > > >    while (!queuedNotificationList.empty()) {
> > > >      NtfSmartPtr notification = queuedNotificationList.front();
> > > > @@ -174,7 +185,7 @@ void
> > NtfLogger::checkQueueAndLog(NtfSmartPtr&
> > > newNotif) {
> > > >
> > > >  SaAisErrorT NtfLogger::logNotification(NtfSmartPtr& notif) {
> > > >    /* Write to the log if we're the local node */
> > > > -  SaAisErrorT errorCode = SA_AIS_OK;
> > > > +  SaAisErrorT rc = SA_AIS_OK;
> > > >    SaLogHeaderT logHeader;
> > > >    char addTextBuf[MAX_ADDITIONAL_TEXT_LENGTH] = {0};
> > > >    SaLogBufferT logBuffer;
> > > > @@ -219,10 +230,10 @@ SaAisErrorT
> > > NtfLogger::logNotification(NtfSmartPtr& notif) {
> > > >    if ((notif->sendNotInfo_->notificationType == SA_NTF_TYPE_ALARM)
> > ||
> > > >        (notif->sendNotInfo_->notificationType ==
> > > SA_NTF_TYPE_SECURITY_ALARM)) {
> > > >      TRACE_2("Logging notification to alarm stream");
> > > > -    errorCode =
> > > > +    rc =
> > > >          saLogWriteLogAsync(alarmStreamHandle, notif-
> >getNotificationId(),
> > > >                             SA_LOG_RECORD_WRITE_ACK, &logRecord);
> > > > -    switch (errorCode) {
> > > > +    switch (rc) {
> > > >        case SA_AIS_OK:
> > > >          break;
> > > >
> > > > @@ -230,107 +241,97 @@ SaAisErrorT
> > > NtfLogger::logNotification(NtfSmartPtr& notif) {
> > > >        case SA_AIS_ERR_TRY_AGAIN:
> > > >        case SA_AIS_ERR_TIMEOUT:
> > > >          TRACE("Failed to log notification (ret: %d). Try next
time.",
> > > > -              errorCode);
> > > > +              rc);
> > > > +        break;
> > > > +      case SA_AIS_ERR_BAD_HANDLE:
> > > > +        TRACE("BAD HANDLE. Log client should be re-initialized");
> > > > +        log_client_initialized = false;
> > > >          break;
> > > > -
> > > >        default:
> > > > -        osaf_abort(errorCode);
> > > > +        LOG_ER("Writing failed: rc = %d", rc);
> > > > +        osaf_abort(rc);
> > > >      }
> > > >    }
> > > >
> > > >    TRACE_LEAVE();
> > > > -  return errorCode;
> > > > +  return rc;
> > > >  }
> > > >
> > > >  SaAisErrorT NtfLogger::initLog() {
> > > > -  SaAisErrorT result;
> > > > +  SaAisErrorT rc = SA_AIS_OK;
> > > >    SaNameT alarmStreamName;
> > > > -  osaf_extended_name_lend(SA_LOG_STREAM_ALARM,
> > > &alarmStreamName);
> > > > -  int first_try = 1;
> > > >    SaVersionT log_version = kLogVersion;
> > > > +  struct timespec timeout_time;
> > > >
> > > >    TRACE_ENTER();
> > > >
> > > > -  /* Initialize the Log service */
> > > > -  do {
> > > > -    result = saLogInitialize(&logHandle, &logCallbacks,
&log_version);
> > > > -    if (SA_AIS_ERR_TRY_AGAIN == result) {
> > > > -      if (first_try) {
> > > > -        LOG_WA("saLogInitialize returns try again, retries...");
> > > > -        first_try = 0;
> > > > -      }
> > > > -      usleep(AIS_TIMEOUT);
> > > > -      log_version = kLogVersion;
> > > > -    }
> > > > -  } while (SA_AIS_ERR_TRY_AGAIN == result);
> > > > +  osaf_extended_name_lend(SA_LOG_STREAM_ALARM,
> > > &alarmStreamName);
> > > >
> > > > -  if (SA_AIS_OK != result) {
> > > > -    LOG_ER("Log initialize result is %d", result);
> > > > -    goto exit_point;
> > > > -  }
> > > > -  if (!first_try) {
> > > > -    LOG_IN("saLogInitialize ok");
> > > > -    first_try = 1;
> > > > +  // Initialize the Log service
> > > > +  osaf_set_millis_timeout(kWaitTime, &timeout_time);
> > > > +  while (!osaf_is_timeout(&timeout_time)) {
> > > > +    rc = saLogInitialize(&logHandle, &logCallbacks, &log_version);
> > > > +    if (rc != SA_AIS_ERR_TRY_AGAIN) break;
> > > > +    log_version = kLogVersion;
> > > > +    osaf_nanosleep(&kHundredMilliseconds);
> > > >    }
> > > >
> > > > -  /* Get file descriptor to use in select */
> > > > -  do {
> > > > -    result = saLogSelectionObjectGet(logHandle, &ntfs_cb-
> > > >logSelectionObject);
> > > > -    if (SA_AIS_ERR_TRY_AGAIN == result) {
> > > > -      if (first_try) {
> > > > -        LOG_WA("saLogSelectionObjectGet returns try again,
retries...");
> > > > -        first_try = 0;
> > > > -      }
> > > > -      usleep(AIS_TIMEOUT);
> > > > -    }
> > > > -  } while (SA_AIS_ERR_TRY_AGAIN == result);
> > > > +  if (rc != SA_AIS_OK) {
> > > > +    LOG_WA("Initialization of log client failed rc = %d", rc);
> > > > +    return rc;
> > > > +  } else {
> > > > +    log_client_initialized = true;
> > > > +  }
> > > >
> > > > -  if (SA_AIS_OK != result) {
> > > > -    LOG_ER("Log SelectionObjectGet result is %d", result);
> > > > -    goto exit_point;
> > > > +  // Get file descriptor to use in select
> > > > +  osaf_set_millis_timeout(kWaitTime, &timeout_time);
> > > > +  while (!osaf_is_timeout(&timeout_time)) {
> > > > +    rc = saLogSelectionObjectGet(logHandle, &ntfs_cb-
> > > >logSelectionObject);
> > > > +    if (rc != SA_AIS_ERR_TRY_AGAIN) break;
> > > > +    osaf_nanosleep(&kHundredMilliseconds);
> > > >    }
> > > >
> > > > -  if (SA_AIS_OK != result) {
> > > > -    LOG_ER("Failed to open the notification log stream (%d)",
result);
> > > > +  if (rc != SA_AIS_OK) {
> > > > +    LOG_WA("Log SelectionObjectGet failed rc = %d", rc);
> > > >      goto exit_point;
> > > >    }
> > > > -  if (!first_try) {
> > > > -    LOG_IN("saLogSelectionObjectGet ok");
> > > > -    first_try = 1;
> > > > -  }
> > > >
> > > > -  /* Open the alarm stream */
> > > > -  do {
> > > > -    result = saLogStreamOpen_2(logHandle, &alarmStreamName, NULL,
> 0,
> > > > -                               LOG_OPEN_TIMEOUT,
&alarmStreamHandle);
> > > > -    if (SA_AIS_ERR_TRY_AGAIN == result) {
> > > > -      if (first_try) {
> > > > -        LOG_WA("saLogStreamOpen_2 returns try again, retries...");
> > > > -        first_try = 0;
> > > > -      }
> > > > -      usleep(AIS_TIMEOUT);
> > > > -    }
> > > > -  } while (SA_AIS_ERR_TRY_AGAIN == result);
> > > > -
> > > > -  if (SA_AIS_OK != result) {
> > > > -    LOG_ER("Failed to open the alarm log stream (%d)", result);
> > > > -    goto exit_point;
> > > > +  // Open the alarm stream
> > > > +  osaf_set_millis_timeout(kWaitTime, &timeout_time);
> > > > +  while (!osaf_is_timeout(&timeout_time)) {
> > > > +    rc = saLogStreamOpen_2(logHandle, &alarmStreamName, NULL, 0,
> > > > +                           LOG_OPEN_TIMEOUT, &alarmStreamHandle);
> > > > +    if (rc != SA_AIS_ERR_TRY_AGAIN) break;
> > > > +    osaf_nanosleep(&kHundredMilliseconds);
> > > >    }
> > > > -  if (!first_try) {
> > > > -    LOG_IN("saLogStreamOpen_2 ok");
> > > > -    first_try = 1;
> > > > +
> > > > +  if (rc != SA_AIS_OK) {
> > > > +    LOG_WA("Log saLogStreamOpen_2 failed rc = %d", rc);
> > > >    }
> > > >
> > > >  exit_point:
> > > > +  if (rc != SA_AIS_OK) {
> > > > +    // Finalize client
> > > > +    osaf_set_millis_timeout(kWaitTime/2, &timeout_time);
> > > > +    while (!osaf_is_timeout(&timeout_time)) {
> > > > +      SaAisErrorT ret = saLogFinalize(logHandle);
> > > > +      if (ret != SA_AIS_ERR_TRY_AGAIN) break;
> > > > +      osaf_nanosleep(&kHundredMilliseconds);
> > > > +    }
> > > > +    log_client_initialized = false;
> > > > +  }
> > > >    TRACE_LEAVE();
> > > > -  return (result);
> > > > +  return rc;
> > > >  }
> > > >
> > > >  void logEvent() {
> > > > -  SaAisErrorT errorCode;
> > > > -  errorCode = saLogDispatch(logHandle, SA_DISPATCH_ALL);
> > > > -  if (SA_AIS_OK != errorCode) {
> > > > -    TRACE_1("Failed to dispatch log events (%d)", errorCode);
> > > > +  SaAisErrorT rc;
> > > > +  rc = saLogDispatch(logHandle, SA_DISPATCH_ALL);
> > > > +  if (rc != SA_AIS_OK) {
> > > > +    LOG_WA("Fail to dispatch log events rc = %d", rc);
> > > > +    if (rc == SA_AIS_ERR_BAD_HANDLE) {
> > > > +      log_client_initialized = false;
> > > > +    }
> > > >    }
> > > >    return;
> > > >  }
> > > > diff --git a/src/ntf/ntfd/NtfLogger.h b/src/ntf/ntfd/NtfLogger.h
> > > > index cc0ac7dba..739b97511 100644
> > > > --- a/src/ntf/ntfd/NtfLogger.h
> > > > +++ b/src/ntf/ntfd/NtfLogger.h
> > > > @@ -56,14 +56,14 @@ class NtfLogger {
> > > >    //    virtual ~NtfLogger();
> > > >
> > > >    void log(NtfSmartPtr& notif, bool isLocal);
> > > > -  void checkQueueAndLog(NtfSmartPtr& notif);
> > > > -  SaAisErrorT logNotification(NtfSmartPtr& notif);
> > > >    void queueNotifcation(NtfSmartPtr& notif);
> > > >    void printInfo();
> > > >    void syncRequest(NCS_UBAID *uba);
> > > >
> > > >   private:
> > > >    SaAisErrorT initLog();
> > > > +  void checkQueueAndLog(NtfSmartPtr& notif);
> > > > +  SaAisErrorT logNotification(NtfSmartPtr& notif);
> > > >
> > > >    readerNotificationListT coll_;
> > > >    unsigned int readCounter;
> > >
> > >
> > >
> > >
----------------------------------------------------------------------------
--
> > > Check out the vibrant tech community on one of the world's most
> > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > > _______________________________________________
> > > Opensaf-devel mailing list
> > > Opensaf-devel@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/opensaf-devel


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to