One more comment: * saf_logger.c needs to be updated to handle the error code SA_AIS_ERR_TIMEOUT
regards, Anders Widell 2013-08-16 12:16, Lennart Lund skrev: > Hi, > > See my comments to Anders W below > > Thanks' > Lennart > >> -----Original Message----- >> From: Anders Widell [mailto:[email protected]] >> Sent: den 16 augusti 2013 10:47 >> To: [email protected] >> Subject: Re: [devel] [PATCH 0 of 3] Review Request for logsv: Fix hanging >> main thread when file i/o dont return [#9] >> >> Hi! >> >> I have some comments: >> >> * Time-out limits shall be configurable (according to our new policy on >> limits) > Already done. See patch, part 12 > >> * I think the use code for killing and re-starting the slave thread is >> overkill and >> should be removed. Unless we know (and have seen) that this solves a real >> problem that can happen in practice. > Ok, I will remove it > >> regards, >> Anders Widell >> >> 2013-08-09 15:20, Lennart Lund skrev: >>> Summary: logsv: Fix hanging main thread when file i/o don't return >>> Review request for Trac Ticket(s): #9 Peer Reviewer(s): Madhurika >>> Koppula, (Anders Widell, Hans Feldt) Pull request to: NA Affected >>> branch(es): devel (4.4) Development branch: <<IF ANY GIVE THE REPO >>> URL>> >>> >>> >>> -------------------------------- >>> Impacted area Impact y/n >>> -------------------------------- >>> Docs n >>> Build system n >>> RPM/packaging n >>> Configuration files n >>> Startup scripts n >>> SAF services y >>> OpenSAF services n >>> Core libraries n >>> Samples n >>> Tests n >>> Other n >>> >>> >>> Comments (indicate scope for each "y" above): >>> --------------------------------------------- >>> In order to protect the log server "main thread" (MT) from hanging if >>> a file operation like write, mkdir etc. does not return, all such >>> operations are done in a separate "file thread" (FT). >>> Functions running in the "Main Thread" (MT) that needs file system >>> operations handle over the execution to the FT when file handling has >>> to be done. Execution is then given back to the MT again. If a file >>> operation does not return FT will hang but MT will time out the FT and >> resume. A timeout will be handled as a file operation fail. >>> The MT can detect if the FT is hanging and new requests for file operations >> will be "failed". >>> Note1: This is an add on to the patches sent out in prevoius review >> requests. >>> Note2: The last patch (part 11); The non block handling of log files that >>> was >> suggested by Madhurika >>> is contained in its' own patch. >>> >>> >>> changeset 43a3e4173f05a01aa595b3d770a1464a3338f32e >>> Author: Lennart Lund <[email protected]> >>> Date: Fri, 09 Aug 2013 13:46:36 +0200 >>> >>> logsv: Fix hanging main thread when file i/o don't return. [#9]. Part >>> 9 >>> >>> - Remove unnecessary data copying in log_file_api() and >> file_hndl_thread() >>> - Return SA_AIS_ERR_TIMEOUT if the write operation time out when >> a log >>> record shall be written. If the file thread is already "hanging" when a >>> write is requested no attempt to write is made and >> SA_AIS_ERR_TRY_AGAIN is >>> returned as before. >>> - Try to recover file thread by recreating it if it hangs for a long >>> time. >>> - Recover if bad file descriptor or stale NFS handle. >>> >>> changeset d3d78dc3ad87e083e411e0ce5436bcca511d54d6 >>> Author: Lennart Lund <[email protected]> >>> Date: Fri, 09 Aug 2013 13:46:36 +0200 >>> >>> logsv: Fix hanging main thread when file i/o don't return. [#9]. Part >>> 10 >>> >>> - Always reinitialize/reopen log files if a write operation fails, >>> timeout >>> of file thread (hanging file system) included. >>> - Handle synchronization between nodes when log files cannot be >> created before >>> a switch over without using any new flag that has to be checkpointed >>> (remove "files_initialized" flag) >>> - Incorrect handling of "partial write" is fixed. See #536 >>> >>> changeset b8a2060ff5fd6d2685f95757d422addeca1ebdb0 >>> Author: Lennart Lund <[email protected]> >>> Date: Fri, 09 Aug 2013 13:46:36 +0200 >>> >>> logsv: Fix hanging main thread when file i/o don't return. [#9]. Part >>> 11 >>> >>> - Open log files with O_NONBLOCK. Answer client with >> AIS_ERR_TIMEOUT if >>> EWOULDBLOCK/EAGAIN (record may be parially written) >>> >>> >>> Complete diffstat: >>> ------------------ >>> osaf/services/saf/logsv/lgs/lgs_evt.c | 11 ++-- >>> osaf/services/saf/logsv/lgs/lgs_file.c | 197 >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> +++++------------------------- >>> osaf/services/saf/logsv/lgs/lgs_filehdl.c | 2 +- >>> osaf/services/saf/logsv/lgs/lgs_imm.c | 1 - >>> osaf/services/saf/logsv/lgs/lgs_mbcsv.c | 7 --- >>> osaf/services/saf/logsv/lgs/lgs_mbcsv.h | 3 - >>> osaf/services/saf/logsv/lgs/lgs_stream.c | 115 >> +++++++++++++++++++++------------------------------ >>> osaf/services/saf/logsv/lgs/lgs_stream.h | 1 - >>> 8 files changed, 196 insertions(+), 141 deletions(-) >>> >>> >>> Testing Commands: >>> ----------------- >>> See previous review >>> >>> >>> Testing, Expected Results: >>> -------------------------- >>> <<PASTE COMMAND OUTPUTS / TEST RESULTS>> >>> >>> >>> Conditions of Submission: >>> ------------------------- >>> <<HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC>> >>> >>> >>> Arch Built Started Linux distro >>> ------------------------------------------- >>> mips n n >>> mips64 n n >>> x86 n n >>> x86_64 n n >>> powerpc n n >>> powerpc64 n n >>> >>> >>> Reviewer Checklist: >>> ------------------- >>> [Submitters: make sure that your review doesn't trigger any >>> checkmarks!] >>> >>> >>> Your checkin has not passed review because (see checked entries): >>> >>> ___ Your RR template is generally incomplete; it has too many blank entries >>> that need proper data filled in. >>> >>> ___ You have failed to nominate the proper persons for review and push. >>> >>> ___ Your patches do not have proper short+long header >>> >>> ___ You have grammar/spelling in your header that is unacceptable. >>> >>> ___ You have exceeded a sensible line length in your >> headers/comments/text. >>> ___ You have failed to put in a proper Trac Ticket # into your commits. >>> >>> ___ You have incorrectly put/left internal data in your comments/files >>> (i.e. internal bug tracking tool IDs, product names etc) >>> >>> ___ You have not given any evidence of testing beyond basic build tests. >>> Demonstrate some level of runtime or other sanity testing. >>> >>> ___ You have ^M present in some of your files. These have to be removed. >>> >>> ___ You have needlessly changed whitespace or added whitespace crimes >>> like trailing spaces, or spaces before tabs. >>> >>> ___ You have mixed real technical changes with whitespace and other >>> cosmetic code cleanup changes. These have to be separate commits. >>> >>> ___ You need to refactor your submission into logical chunks; there is >>> too much content into a single commit. >>> >>> ___ You have extraneous garbage in your review (merge commits etc) >>> >>> ___ You have giant attachments which should never have been sent; >>> Instead you should place your content in a public tree to be pulled. >>> >>> ___ You have too many commits attached to an e-mail; resend as threaded >>> commits, or place in a public tree for a pull. >>> >>> ___ You have resent this content multiple times without a clear indication >>> of what has changed between each re-send. >>> >>> ___ You have failed to adequately and individually address all of the >>> comments and change requests that were proposed in the initial review. >>> >>> ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc) >>> >>> ___ Your computer have a badly configured date and time; confusing the >>> the threaded patch review. >>> >>> ___ Your changes affect IPC mechanism, and you don't present any results >>> for in-service upgradability test. >>> >>> ___ Your changes affect user manual and documentation, your patch series >>> do not contain the patch that updates the Doxygen manual. >>> >>> >>> ---------------------------------------------------------------------- >>> -------- Get 100% visibility into Java/.NET code with AppDynamics >>> Lite! >>> It's a free troubleshooting tool designed for production. >>> Get down to code-level detail for bottlenecks, with <2% overhead. >>> Download for free and get started troubleshooting in minutes. >>> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.c >>> lktrk _______________________________________________ >>> Opensaf-devel mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel >>> >>> >> >> ------------------------------------------------------------------------------ >> Get 100% visibility into Java/.NET code with AppDynamics Lite! >> It's a free troubleshooting tool designed for production. >> Get down to code-level detail for bottlenecks, with <2% overhead. >> Download for free and get started troubleshooting in minutes. >> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clk >> trk >> _______________________________________________ >> Opensaf-devel mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/opensaf-devel ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
