Hi, Is it ok to just add SA_AIS_ERR_TIMEOUT to the try again loop (poll_retry)?
Existing code: if (cb_error == SA_AIS_ERR_TRY_AGAIN) { usleep(100000); /* 100 ms */ try_agains++; goto retry; } Changed code: retry: if ((cb_error == SA_AIS_ERR_TRY_AGAIN) || (errorCode == SA_AIS_ERR_TIMEOUT)) { usleep(100000); /* 100 ms */ try_agains++; goto retry; } Thanks' Lennart > -----Original Message----- > From: Anders Widell > Sent: den 16 augusti 2013 12:25 > To: Lennart Lund > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [devel] [PATCH 0 of 3] Review Request for logsv: Fix hanging > main thread when file i/o dont return [#9] > > One more comment: > > * saf_logger.c needs to be updated to handle the error code > SA_AIS_ERR_TIMEOUT > > regards, > Anders Widell > > 2013-08-16 12:16, Lennart Lund skrev: > > Hi, > > > > See my comments to Anders W below > > > > Thanks' > > Lennart > > > >> -----Original Message----- > >> From: Anders Widell [mailto:anders.wid...@ericsson.com] > >> Sent: den 16 augusti 2013 10:47 > >> To: opensaf-devel@lists.sourceforge.net > >> Subject: Re: [devel] [PATCH 0 of 3] Review Request for logsv: Fix > >> hanging main thread when file i/o dont return [#9] > >> > >> Hi! > >> > >> I have some comments: > >> > >> * Time-out limits shall be configurable (according to our new policy > >> on > >> limits) > > Already done. See patch, part 12 > > > >> * I think the use code for killing and re-starting the slave thread > >> is overkill and should be removed. Unless we know (and have seen) > >> that this solves a real problem that can happen in practice. > > Ok, I will remove it > > > >> regards, > >> Anders Widell > >> > >> 2013-08-09 15:20, Lennart Lund skrev: > >>> Summary: logsv: Fix hanging main thread when file i/o don't return > >>> Review request for Trac Ticket(s): #9 Peer Reviewer(s): Madhurika > >>> Koppula, (Anders Widell, Hans Feldt) Pull request to: NA Affected > >>> branch(es): devel (4.4) Development branch: <<IF ANY GIVE THE REPO > >>> URL>> > >>> > >>> > >>> -------------------------------- > >>> Impacted area Impact y/n > >>> -------------------------------- > >>> Docs n > >>> Build system n > >>> RPM/packaging n > >>> Configuration files n > >>> Startup scripts n > >>> SAF services y > >>> OpenSAF services n > >>> Core libraries n > >>> Samples n > >>> Tests n > >>> Other n > >>> > >>> > >>> Comments (indicate scope for each "y" above): > >>> --------------------------------------------- > >>> In order to protect the log server "main thread" (MT) from hanging > >>> if a file operation like write, mkdir etc. does not return, all such > >>> operations are done in a separate "file thread" (FT). > >>> Functions running in the "Main Thread" (MT) that needs file system > >>> operations handle over the execution to the FT when file handling > >>> has to be done. Execution is then given back to the MT again. If a > >>> file operation does not return FT will hang but MT will time out the > >>> FT and > >> resume. A timeout will be handled as a file operation fail. > >>> The MT can detect if the FT is hanging and new requests for file > >>> operations > >> will be "failed". > >>> Note1: This is an add on to the patches sent out in prevoius review > >> requests. > >>> Note2: The last patch (part 11); The non block handling of log files > >>> that was > >> suggested by Madhurika > >>> is contained in its' own patch. > >>> > >>> > >>> changeset 43a3e4173f05a01aa595b3d770a1464a3338f32e > >>> Author: Lennart Lund <lennart.l...@ericsson.com> > >>> Date: Fri, 09 Aug 2013 13:46:36 +0200 > >>> > >>> logsv: Fix hanging main thread when file i/o don't return. [#9]. > >>> Part > >>> 9 > >>> > >>> - Remove unnecessary data copying in log_file_api() and > >> file_hndl_thread() > >>> - Return SA_AIS_ERR_TIMEOUT if the write operation time out when > >> a log > >>> record shall be written. If the file thread is already "hanging" when a > >>> write is requested no attempt to write is made and > >> SA_AIS_ERR_TRY_AGAIN is > >>> returned as before. > >>> - Try to recover file thread by recreating it if it hangs for a long > >>> time. > >>> - Recover if bad file descriptor or stale NFS handle. > >>> > >>> changeset d3d78dc3ad87e083e411e0ce5436bcca511d54d6 > >>> Author: Lennart Lund <lennart.l...@ericsson.com> > >>> Date: Fri, 09 Aug 2013 13:46:36 +0200 > >>> > >>> logsv: Fix hanging main thread when file i/o don't return. [#9]. > >>> Part > >>> 10 > >>> > >>> - Always reinitialize/reopen log files if a write operation fails, > >>> timeout > >>> of file thread (hanging file system) included. > >>> - Handle synchronization between nodes when log files cannot be > >> created before > >>> a switch over without using any new flag that has to be checkpointed > >>> (remove "files_initialized" flag) > >>> - Incorrect handling of "partial write" is fixed. See #536 > >>> > >>> changeset b8a2060ff5fd6d2685f95757d422addeca1ebdb0 > >>> Author: Lennart Lund <lennart.l...@ericsson.com> > >>> Date: Fri, 09 Aug 2013 13:46:36 +0200 > >>> > >>> logsv: Fix hanging main thread when file i/o don't return. [#9]. > >>> Part > >>> 11 > >>> > >>> - Open log files with O_NONBLOCK. Answer client with > >> AIS_ERR_TIMEOUT if > >>> EWOULDBLOCK/EAGAIN (record may be parially written) > >>> > >>> > >>> Complete diffstat: > >>> ------------------ > >>> osaf/services/saf/logsv/lgs/lgs_evt.c | 11 ++-- > >>> osaf/services/saf/logsv/lgs/lgs_file.c | 197 > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> +++++------------------------- > >>> osaf/services/saf/logsv/lgs/lgs_filehdl.c | 2 +- > >>> osaf/services/saf/logsv/lgs/lgs_imm.c | 1 - > >>> osaf/services/saf/logsv/lgs/lgs_mbcsv.c | 7 --- > >>> osaf/services/saf/logsv/lgs/lgs_mbcsv.h | 3 - > >>> osaf/services/saf/logsv/lgs/lgs_stream.c | 115 > >> +++++++++++++++++++++------------------------------ > >>> osaf/services/saf/logsv/lgs/lgs_stream.h | 1 - > >>> 8 files changed, 196 insertions(+), 141 deletions(-) > >>> > >>> > >>> Testing Commands: > >>> ----------------- > >>> See previous review > >>> > >>> > >>> Testing, Expected Results: > >>> -------------------------- > >>> <<PASTE COMMAND OUTPUTS / TEST RESULTS>> > >>> > >>> > >>> Conditions of Submission: > >>> ------------------------- > >>> <<HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC>> > >>> > >>> > >>> Arch Built Started Linux distro > >>> ------------------------------------------- > >>> mips n n > >>> mips64 n n > >>> x86 n n > >>> x86_64 n n > >>> powerpc n n > >>> powerpc64 n n > >>> > >>> > >>> Reviewer Checklist: > >>> ------------------- > >>> [Submitters: make sure that your review doesn't trigger any > >>> checkmarks!] > >>> > >>> > >>> Your checkin has not passed review because (see checked entries): > >>> > >>> ___ Your RR template is generally incomplete; it has too many blank > entries > >>> that need proper data filled in. > >>> > >>> ___ You have failed to nominate the proper persons for review and > push. > >>> > >>> ___ Your patches do not have proper short+long header > >>> > >>> ___ You have grammar/spelling in your header that is unacceptable. > >>> > >>> ___ You have exceeded a sensible line length in your > >> headers/comments/text. > >>> ___ You have failed to put in a proper Trac Ticket # into your commits. > >>> > >>> ___ You have incorrectly put/left internal data in your comments/files > >>> (i.e. internal bug tracking tool IDs, product names etc) > >>> > >>> ___ You have not given any evidence of testing beyond basic build tests. > >>> Demonstrate some level of runtime or other sanity testing. > >>> > >>> ___ You have ^M present in some of your files. These have to be > removed. > >>> > >>> ___ You have needlessly changed whitespace or added whitespace > crimes > >>> like trailing spaces, or spaces before tabs. > >>> > >>> ___ You have mixed real technical changes with whitespace and other > >>> cosmetic code cleanup changes. These have to be separate commits. > >>> > >>> ___ You need to refactor your submission into logical chunks; there is > >>> too much content into a single commit. > >>> > >>> ___ You have extraneous garbage in your review (merge commits etc) > >>> > >>> ___ You have giant attachments which should never have been sent; > >>> Instead you should place your content in a public tree to be pulled. > >>> > >>> ___ You have too many commits attached to an e-mail; resend as > threaded > >>> commits, or place in a public tree for a pull. > >>> > >>> ___ You have resent this content multiple times without a clear > indication > >>> of what has changed between each re-send. > >>> > >>> ___ You have failed to adequately and individually address all of the > >>> comments and change requests that were proposed in the initial > review. > >>> > >>> ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) > >>> > >>> ___ Your computer have a badly configured date and time; confusing > the > >>> the threaded patch review. > >>> > >>> ___ Your changes affect IPC mechanism, and you don't present any > results > >>> for in-service upgradability test. > >>> > >>> ___ Your changes affect user manual and documentation, your patch > series > >>> do not contain the patch that updates the Doxygen manual. > >>> > >>> > >>> -------------------------------------------------------------------- > >>> -- > >>> -------- Get 100% visibility into Java/.NET code with AppDynamics > >>> Lite! > >>> It's a free troubleshooting tool designed for production. > >>> Get down to code-level detail for bottlenecks, with <2% overhead. > >>> Download for free and get started troubleshooting in minutes. > >>> > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg > >>> .c lktrk _______________________________________________ > >>> Opensaf-devel mailing list > >>> Opensaf-devel@lists.sourceforge.net > >>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel > >>> > >>> > >> > >> --------------------------------------------------------------------- > >> --------- Get 100% visibility into Java/.NET code with AppDynamics > >> Lite! > >> It's a free troubleshooting tool designed for production. > >> Get down to code-level detail for bottlenecks, with <2% overhead. > >> Download for free and get started troubleshooting in minutes. > >> > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg. > >> clk > >> trk > >> _______________________________________________ > >> Opensaf-devel mailing list > >> Opensaf-devel@lists.sourceforge.net > >> https://lists.sourceforge.net/lists/listinfo/opensaf-devel ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel