Hi,
Is it ok to just add SA_AIS_ERR_TIMEOUT to the try again loop (poll_retry)?
Existing code:
if (cb_error == SA_AIS_ERR_TRY_AGAIN) {
usleep(100000); /* 100 ms */
try_agains++;
goto retry;
}
Changed code:
retry:
if ((cb_error == SA_AIS_ERR_TRY_AGAIN) || (errorCode ==
SA_AIS_ERR_TIMEOUT)) {
usleep(100000); /* 100 ms */
try_agains++;
goto retry;
}
Thanks'
Lennart
> -----Original Message-----
> From: Anders Widell
> Sent: den 16 augusti 2013 12:25
> To: Lennart Lund
> Cc: [email protected]
> Subject: Re: [devel] [PATCH 0 of 3] Review Request for logsv: Fix hanging
> main thread when file i/o dont return [#9]
>
> One more comment:
>
> * saf_logger.c needs to be updated to handle the error code
> SA_AIS_ERR_TIMEOUT
>
> regards,
> Anders Widell
>
> 2013-08-16 12:16, Lennart Lund skrev:
> > Hi,
> >
> > See my comments to Anders W below
> >
> > Thanks'
> > Lennart
> >
> >> -----Original Message-----
> >> From: Anders Widell [mailto:[email protected]]
> >> Sent: den 16 augusti 2013 10:47
> >> To: [email protected]
> >> Subject: Re: [devel] [PATCH 0 of 3] Review Request for logsv: Fix
> >> hanging main thread when file i/o dont return [#9]
> >>
> >> Hi!
> >>
> >> I have some comments:
> >>
> >> * Time-out limits shall be configurable (according to our new policy
> >> on
> >> limits)
> > Already done. See patch, part 12
> >
> >> * I think the use code for killing and re-starting the slave thread
> >> is overkill and should be removed. Unless we know (and have seen)
> >> that this solves a real problem that can happen in practice.
> > Ok, I will remove it
> >
> >> regards,
> >> Anders Widell
> >>
> >> 2013-08-09 15:20, Lennart Lund skrev:
> >>> Summary: logsv: Fix hanging main thread when file i/o don't return
> >>> Review request for Trac Ticket(s): #9 Peer Reviewer(s): Madhurika
> >>> Koppula, (Anders Widell, Hans Feldt) Pull request to: NA Affected
> >>> branch(es): devel (4.4) Development branch: <<IF ANY GIVE THE REPO
> >>> URL>>
> >>>
> >>>
> >>> --------------------------------
> >>> Impacted area Impact y/n
> >>> --------------------------------
> >>> Docs n
> >>> Build system n
> >>> RPM/packaging n
> >>> Configuration files n
> >>> Startup scripts n
> >>> SAF services y
> >>> OpenSAF services n
> >>> Core libraries n
> >>> Samples n
> >>> Tests n
> >>> Other n
> >>>
> >>>
> >>> Comments (indicate scope for each "y" above):
> >>> ---------------------------------------------
> >>> In order to protect the log server "main thread" (MT) from hanging
> >>> if a file operation like write, mkdir etc. does not return, all such
> >>> operations are done in a separate "file thread" (FT).
> >>> Functions running in the "Main Thread" (MT) that needs file system
> >>> operations handle over the execution to the FT when file handling
> >>> has to be done. Execution is then given back to the MT again. If a
> >>> file operation does not return FT will hang but MT will time out the
> >>> FT and
> >> resume. A timeout will be handled as a file operation fail.
> >>> The MT can detect if the FT is hanging and new requests for file
> >>> operations
> >> will be "failed".
> >>> Note1: This is an add on to the patches sent out in prevoius review
> >> requests.
> >>> Note2: The last patch (part 11); The non block handling of log files
> >>> that was
> >> suggested by Madhurika
> >>> is contained in its' own patch.
> >>>
> >>>
> >>> changeset 43a3e4173f05a01aa595b3d770a1464a3338f32e
> >>> Author: Lennart Lund <[email protected]>
> >>> Date: Fri, 09 Aug 2013 13:46:36 +0200
> >>>
> >>> logsv: Fix hanging main thread when file i/o don't return. [#9].
> >>> Part
> >>> 9
> >>>
> >>> - Remove unnecessary data copying in log_file_api() and
> >> file_hndl_thread()
> >>> - Return SA_AIS_ERR_TIMEOUT if the write operation time out when
> >> a log
> >>> record shall be written. If the file thread is already "hanging" when a
> >>> write is requested no attempt to write is made and
> >> SA_AIS_ERR_TRY_AGAIN is
> >>> returned as before.
> >>> - Try to recover file thread by recreating it if it hangs for a long
> >>> time.
> >>> - Recover if bad file descriptor or stale NFS handle.
> >>>
> >>> changeset d3d78dc3ad87e083e411e0ce5436bcca511d54d6
> >>> Author: Lennart Lund <[email protected]>
> >>> Date: Fri, 09 Aug 2013 13:46:36 +0200
> >>>
> >>> logsv: Fix hanging main thread when file i/o don't return. [#9].
> >>> Part
> >>> 10
> >>>
> >>> - Always reinitialize/reopen log files if a write operation fails,
> >>> timeout
> >>> of file thread (hanging file system) included.
> >>> - Handle synchronization between nodes when log files cannot be
> >> created before
> >>> a switch over without using any new flag that has to be checkpointed
> >>> (remove "files_initialized" flag)
> >>> - Incorrect handling of "partial write" is fixed. See #536
> >>>
> >>> changeset b8a2060ff5fd6d2685f95757d422addeca1ebdb0
> >>> Author: Lennart Lund <[email protected]>
> >>> Date: Fri, 09 Aug 2013 13:46:36 +0200
> >>>
> >>> logsv: Fix hanging main thread when file i/o don't return. [#9].
> >>> Part
> >>> 11
> >>>
> >>> - Open log files with O_NONBLOCK. Answer client with
> >> AIS_ERR_TIMEOUT if
> >>> EWOULDBLOCK/EAGAIN (record may be parially written)
> >>>
> >>>
> >>> Complete diffstat:
> >>> ------------------
> >>> osaf/services/saf/logsv/lgs/lgs_evt.c | 11 ++--
> >>> osaf/services/saf/logsv/lgs/lgs_file.c | 197
> >>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> +++++-------------------------
> >>> osaf/services/saf/logsv/lgs/lgs_filehdl.c | 2 +-
> >>> osaf/services/saf/logsv/lgs/lgs_imm.c | 1 -
> >>> osaf/services/saf/logsv/lgs/lgs_mbcsv.c | 7 ---
> >>> osaf/services/saf/logsv/lgs/lgs_mbcsv.h | 3 -
> >>> osaf/services/saf/logsv/lgs/lgs_stream.c | 115
> >> +++++++++++++++++++++------------------------------
> >>> osaf/services/saf/logsv/lgs/lgs_stream.h | 1 -
> >>> 8 files changed, 196 insertions(+), 141 deletions(-)
> >>>
> >>>
> >>> Testing Commands:
> >>> -----------------
> >>> See previous review
> >>>
> >>>
> >>> Testing, Expected Results:
> >>> --------------------------
> >>> <<PASTE COMMAND OUTPUTS / TEST RESULTS>>
> >>>
> >>>
> >>> Conditions of Submission:
> >>> -------------------------
> >>> <<HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC>>
> >>>
> >>>
> >>> Arch Built Started Linux distro
> >>> -------------------------------------------
> >>> mips n n
> >>> mips64 n n
> >>> x86 n n
> >>> x86_64 n n
> >>> powerpc n n
> >>> powerpc64 n n
> >>>
> >>>
> >>> Reviewer Checklist:
> >>> -------------------
> >>> [Submitters: make sure that your review doesn't trigger any
> >>> checkmarks!]
> >>>
> >>>
> >>> Your checkin has not passed review because (see checked entries):
> >>>
> >>> ___ Your RR template is generally incomplete; it has too many blank
> entries
> >>> that need proper data filled in.
> >>>
> >>> ___ You have failed to nominate the proper persons for review and
> push.
> >>>
> >>> ___ Your patches do not have proper short+long header
> >>>
> >>> ___ You have grammar/spelling in your header that is unacceptable.
> >>>
> >>> ___ You have exceeded a sensible line length in your
> >> headers/comments/text.
> >>> ___ You have failed to put in a proper Trac Ticket # into your commits.
> >>>
> >>> ___ You have incorrectly put/left internal data in your comments/files
> >>> (i.e. internal bug tracking tool IDs, product names etc)
> >>>
> >>> ___ You have not given any evidence of testing beyond basic build tests.
> >>> Demonstrate some level of runtime or other sanity testing.
> >>>
> >>> ___ You have ^M present in some of your files. These have to be
> removed.
> >>>
> >>> ___ You have needlessly changed whitespace or added whitespace
> crimes
> >>> like trailing spaces, or spaces before tabs.
> >>>
> >>> ___ You have mixed real technical changes with whitespace and other
> >>> cosmetic code cleanup changes. These have to be separate commits.
> >>>
> >>> ___ You need to refactor your submission into logical chunks; there is
> >>> too much content into a single commit.
> >>>
> >>> ___ You have extraneous garbage in your review (merge commits etc)
> >>>
> >>> ___ You have giant attachments which should never have been sent;
> >>> Instead you should place your content in a public tree to be pulled.
> >>>
> >>> ___ You have too many commits attached to an e-mail; resend as
> threaded
> >>> commits, or place in a public tree for a pull.
> >>>
> >>> ___ You have resent this content multiple times without a clear
> indication
> >>> of what has changed between each re-send.
> >>>
> >>> ___ You have failed to adequately and individually address all of the
> >>> comments and change requests that were proposed in the initial
> review.
> >>>
> >>> ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)
> >>>
> >>> ___ Your computer have a badly configured date and time; confusing
> the
> >>> the threaded patch review.
> >>>
> >>> ___ Your changes affect IPC mechanism, and you don't present any
> results
> >>> for in-service upgradability test.
> >>>
> >>> ___ Your changes affect user manual and documentation, your patch
> series
> >>> do not contain the patch that updates the Doxygen manual.
> >>>
> >>>
> >>> --------------------------------------------------------------------
> >>> --
> >>> -------- Get 100% visibility into Java/.NET code with AppDynamics
> >>> Lite!
> >>> It's a free troubleshooting tool designed for production.
> >>> Get down to code-level detail for bottlenecks, with <2% overhead.
> >>> Download for free and get started troubleshooting in minutes.
> >>>
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg
> >>> .c lktrk _______________________________________________
> >>> Opensaf-devel mailing list
> >>> [email protected]
> >>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
> >>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> --------- Get 100% visibility into Java/.NET code with AppDynamics
> >> Lite!
> >> It's a free troubleshooting tool designed for production.
> >> Get down to code-level detail for bottlenecks, with <2% overhead.
> >> Download for free and get started troubleshooting in minutes.
> >>
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.
> >> clk
> >> trk
> >> _______________________________________________
> >> Opensaf-devel mailing list
> >> [email protected]
> >> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel