Hi,

Is it ok to just add SA_AIS_ERR_TIMEOUT to the try again loop (poll_retry)?

Existing code:
        if (cb_error == SA_AIS_ERR_TRY_AGAIN) {
                usleep(100000); /* 100 ms */
                try_agains++;
                goto retry;
        }

Changed code:
retry:
        if ((cb_error == SA_AIS_ERR_TRY_AGAIN) || (errorCode == 
SA_AIS_ERR_TIMEOUT)) {
                usleep(100000); /* 100 ms */
                try_agains++;
                goto retry;
        }

Thanks'
Lennart

> -----Original Message-----
> From: Anders Widell
> Sent: den 16 augusti 2013 12:25
> To: Lennart Lund
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [devel] [PATCH 0 of 3] Review Request for logsv: Fix hanging
> main thread when file i/o dont return [#9]
> 
> One more comment:
> 
> * saf_logger.c needs to be updated to handle the error code
> SA_AIS_ERR_TIMEOUT
> 
> regards,
> Anders Widell
> 
> 2013-08-16 12:16, Lennart Lund skrev:
> > Hi,
> >
> > See my comments to Anders W below
> >
> > Thanks'
> > Lennart
> >
> >> -----Original Message-----
> >> From: Anders Widell [mailto:anders.wid...@ericsson.com]
> >> Sent: den 16 augusti 2013 10:47
> >> To: opensaf-devel@lists.sourceforge.net
> >> Subject: Re: [devel] [PATCH 0 of 3] Review Request for logsv: Fix
> >> hanging main thread when file i/o dont return [#9]
> >>
> >> Hi!
> >>
> >> I have some comments:
> >>
> >> * Time-out limits shall be configurable (according to our new policy
> >> on
> >> limits)
> > Already done. See patch, part 12
> >
> >> * I think the use code for killing and re-starting the slave thread
> >> is overkill and should be removed. Unless we know (and have seen)
> >> that this solves a real problem that can happen in practice.
> > Ok, I will remove it
> >
> >> regards,
> >> Anders Widell
> >>
> >> 2013-08-09 15:20, Lennart Lund skrev:
> >>> Summary: logsv: Fix hanging main thread when file i/o don't return
> >>> Review request for Trac Ticket(s): #9 Peer Reviewer(s): Madhurika
> >>> Koppula, (Anders Widell, Hans Feldt) Pull request to: NA Affected
> >>> branch(es): devel (4.4) Development branch: <<IF ANY GIVE THE REPO
> >>> URL>>
> >>>
> >>>
> >>> --------------------------------
> >>> Impacted area       Impact y/n
> >>> --------------------------------
> >>>    Docs                    n
> >>>    Build system            n
> >>>    RPM/packaging           n
> >>>    Configuration files     n
> >>>    Startup scripts         n
> >>>    SAF services            y
> >>>    OpenSAF services        n
> >>>    Core libraries          n
> >>>    Samples                 n
> >>>    Tests                   n
> >>>    Other                   n
> >>>
> >>>
> >>> Comments (indicate scope for each "y" above):
> >>> ---------------------------------------------
> >>> In order to protect the log server "main thread" (MT) from hanging
> >>> if a file operation like write, mkdir etc. does not return, all such
> >>> operations are done in a separate "file thread" (FT).
> >>> Functions running in the "Main Thread" (MT) that needs file system
> >>> operations handle over the execution to the FT when file handling
> >>> has to be done. Execution is then given back to the MT again. If a
> >>> file operation does not return FT will hang but MT will time out the
> >>> FT and
> >> resume. A timeout will be handled as a file operation fail.
> >>> The MT can detect if the FT is hanging and new requests for file
> >>> operations
> >> will be "failed".
> >>> Note1: This is an add on to the patches sent out in prevoius review
> >> requests.
> >>> Note2: The last patch (part 11); The non block handling of log files
> >>> that was
> >> suggested by Madhurika
> >>>          is contained in its' own patch.
> >>>
> >>>
> >>> changeset 43a3e4173f05a01aa595b3d770a1464a3338f32e
> >>> Author:   Lennart Lund <lennart.l...@ericsson.com>
> >>> Date:     Fri, 09 Aug 2013 13:46:36 +0200
> >>>
> >>>   logsv: Fix hanging main thread when file i/o don't return. [#9].
> >>> Part
> >>> 9
> >>>
> >>>   - Remove unnecessary data copying in log_file_api() and
> >> file_hndl_thread()
> >>>   - Return SA_AIS_ERR_TIMEOUT if the write operation time out when
> >> a log
> >>>   record shall be written. If the file thread is already "hanging" when a
> >>>   write is requested no attempt to write is made and
> >> SA_AIS_ERR_TRY_AGAIN is
> >>>   returned as before.
> >>>   - Try to recover file thread by recreating it if it hangs for a long 
> >>> time.
> >>>   - Recover if bad file descriptor or stale NFS handle.
> >>>
> >>> changeset d3d78dc3ad87e083e411e0ce5436bcca511d54d6
> >>> Author:   Lennart Lund <lennart.l...@ericsson.com>
> >>> Date:     Fri, 09 Aug 2013 13:46:36 +0200
> >>>
> >>>   logsv: Fix hanging main thread when file i/o don't return. [#9].
> >>> Part
> >>> 10
> >>>
> >>>   - Always reinitialize/reopen log files if a write operation fails, 
> >>> timeout
> >>>   of file thread (hanging file system) included.
> >>>   - Handle synchronization between nodes when log files cannot be
> >> created before
> >>>   a switch over without using any new flag that has to be checkpointed
> >>>   (remove "files_initialized" flag)
> >>>   - Incorrect handling of "partial write" is fixed. See #536
> >>>
> >>> changeset b8a2060ff5fd6d2685f95757d422addeca1ebdb0
> >>> Author:   Lennart Lund <lennart.l...@ericsson.com>
> >>> Date:     Fri, 09 Aug 2013 13:46:36 +0200
> >>>
> >>>   logsv: Fix hanging main thread when file i/o don't return. [#9].
> >>> Part
> >>> 11
> >>>
> >>>   - Open log files with O_NONBLOCK. Answer client with
> >> AIS_ERR_TIMEOUT if
> >>>   EWOULDBLOCK/EAGAIN (record may be parially written)
> >>>
> >>>
> >>> Complete diffstat:
> >>> ------------------
> >>>    osaf/services/saf/logsv/lgs/lgs_evt.c     |   11 ++--
> >>>    osaf/services/saf/logsv/lgs/lgs_file.c    |  197
> >>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> +++++-------------------------
> >>>    osaf/services/saf/logsv/lgs/lgs_filehdl.c |    2 +-
> >>>    osaf/services/saf/logsv/lgs/lgs_imm.c     |    1 -
> >>>    osaf/services/saf/logsv/lgs/lgs_mbcsv.c   |    7 ---
> >>>    osaf/services/saf/logsv/lgs/lgs_mbcsv.h   |    3 -
> >>>    osaf/services/saf/logsv/lgs/lgs_stream.c  |  115
> >> +++++++++++++++++++++------------------------------
> >>>    osaf/services/saf/logsv/lgs/lgs_stream.h  |    1 -
> >>>    8 files changed, 196 insertions(+), 141 deletions(-)
> >>>
> >>>
> >>> Testing Commands:
> >>> -----------------
> >>> See previous review
> >>>
> >>>
> >>> Testing, Expected Results:
> >>> --------------------------
> >>>    <<PASTE COMMAND OUTPUTS / TEST RESULTS>>
> >>>
> >>>
> >>> Conditions of Submission:
> >>> -------------------------
> >>>    <<HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC>>
> >>>
> >>>
> >>> Arch      Built     Started    Linux distro
> >>> -------------------------------------------
> >>> mips        n          n
> >>> mips64      n          n
> >>> x86         n          n
> >>> x86_64      n          n
> >>> powerpc     n          n
> >>> powerpc64   n          n
> >>>
> >>>
> >>> Reviewer Checklist:
> >>> -------------------
> >>> [Submitters: make sure that your review doesn't trigger any
> >>> checkmarks!]
> >>>
> >>>
> >>> Your checkin has not passed review because (see checked entries):
> >>>
> >>> ___ Your RR template is generally incomplete; it has too many blank
> entries
> >>>       that need proper data filled in.
> >>>
> >>> ___ You have failed to nominate the proper persons for review and
> push.
> >>>
> >>> ___ Your patches do not have proper short+long header
> >>>
> >>> ___ You have grammar/spelling in your header that is unacceptable.
> >>>
> >>> ___ You have exceeded a sensible line length in your
> >> headers/comments/text.
> >>> ___ You have failed to put in a proper Trac Ticket # into your commits.
> >>>
> >>> ___ You have incorrectly put/left internal data in your comments/files
> >>>       (i.e. internal bug tracking tool IDs, product names etc)
> >>>
> >>> ___ You have not given any evidence of testing beyond basic build tests.
> >>>       Demonstrate some level of runtime or other sanity testing.
> >>>
> >>> ___ You have ^M present in some of your files. These have to be
> removed.
> >>>
> >>> ___ You have needlessly changed whitespace or added whitespace
> crimes
> >>>       like trailing spaces, or spaces before tabs.
> >>>
> >>> ___ You have mixed real technical changes with whitespace and other
> >>>       cosmetic code cleanup changes. These have to be separate commits.
> >>>
> >>> ___ You need to refactor your submission into logical chunks; there is
> >>>       too much content into a single commit.
> >>>
> >>> ___ You have extraneous garbage in your review (merge commits etc)
> >>>
> >>> ___ You have giant attachments which should never have been sent;
> >>>       Instead you should place your content in a public tree to be pulled.
> >>>
> >>> ___ You have too many commits attached to an e-mail; resend as
> threaded
> >>>       commits, or place in a public tree for a pull.
> >>>
> >>> ___ You have resent this content multiple times without a clear
> indication
> >>>       of what has changed between each re-send.
> >>>
> >>> ___ You have failed to adequately and individually address all of the
> >>>       comments and change requests that were proposed in the initial
> review.
> >>>
> >>> ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)
> >>>
> >>> ___ Your computer have a badly configured date and time; confusing
> the
> >>>       the threaded patch review.
> >>>
> >>> ___ Your changes affect IPC mechanism, and you don't present any
> results
> >>>       for in-service upgradability test.
> >>>
> >>> ___ Your changes affect user manual and documentation, your patch
> series
> >>>       do not contain the patch that updates the Doxygen manual.
> >>>
> >>>
> >>> --------------------------------------------------------------------
> >>> --
> >>> -------- Get 100% visibility into Java/.NET code with AppDynamics
> >>> Lite!
> >>> It's a free troubleshooting tool designed for production.
> >>> Get down to code-level detail for bottlenecks, with <2% overhead.
> >>> Download for free and get started troubleshooting in minutes.
> >>>
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg
> >>> .c lktrk _______________________________________________
> >>> Opensaf-devel mailing list
> >>> Opensaf-devel@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
> >>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> --------- Get 100% visibility into Java/.NET code with AppDynamics
> >> Lite!
> >> It's a free troubleshooting tool designed for production.
> >> Get down to code-level detail for bottlenecks, with <2% overhead.
> >> Download for free and get started troubleshooting in minutes.
> >>
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.
> >> clk
> >> trk
> >> _______________________________________________
> >> Opensaf-devel mailing list
> >> Opensaf-devel@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/opensaf-devel


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to