- Description has changed:

Diff:

~~~~

--- old
+++ new
@@ -71,11 +71,11 @@
 Here is the log:
 
 Aug 22  7:43:05.038105 osaflogd [4110:lgs_stream.c:0740] >> log_file_open 
-Aug 22  7:43:05.038111 osaflogd [4110:lgs_stream.c:0750] TR log_file_open - 
Opening file 
"/storage/no-backup/coremw/var/log/saflog/test/test_20140822_074304.log"
+Aug 22  7:43:05.038111 osaflogd [4110:lgs_stream.c:0750] TR log_file_open - 
Opening file 
"/storage/no-backup/xxx/var/log/saflog/test/test_20140822_074304.log"
 Aug 22  7:43:05.038116 osaflogd [4110:lgs_stream.c:0063] >> fileopen_h 
-Aug 22  7:43:05.038121 osaflogd [4110:lgs_stream.c:0074] TR fileopen_h - 
filepath 
"/storage/no-backup/coremw/var/log/saflog/test/test_20140822_074304.log"
+Aug 22  7:43:05.038121 osaflogd [4110:lgs_stream.c:0074] TR fileopen_h - 
filepath "/storage/no-backup/xxx/var/log/saflog/test/test_20140822_074304.log"
 Aug 22  7:43:05.038393 osaflogd [4110:lgs_filehdl.c:0448] >> fileopen_hdl 
-Aug 22  7:43:05.038417 osaflogd [4110:lgs_filehdl.c:0450] TR fileopen_hdl - 
filepath 
"/storage/no-backup/coremw/var/log/saflog/test/test_20140822_074304.log"
+Aug 22  7:43:05.038417 osaflogd [4110:lgs_filehdl.c:0450] TR fileopen_hdl - 
filepath "/storage/no-backup/xxx/var/log/saflog/test/test_20140822_074304.log"
 …
 Aug 22  7:43:05.539345 osaflogd [4110:lgs_file.c:0342] TR Timed out before 
answer
 Aug 22  7:43:05.558948 osaflogd [4110:lgs_stream.c:0085] TR fileopen_h - API 
error LGSF_TIMEOUT

~~~~

- **Blocker**:  --> False



---

** [tickets:#1042] LOG: out of control file descriptor of log file cause .nfs 
file generate when log file deleted by rotation mechanism**

**Status:** fixed
**Milestone:** 4.4.1
**Created:** Thu Sep 04, 2014 02:30 AM UTC by Thuan
**Last Updated:** Thu Sep 25, 2014 06:51 AM UTC
**Owner:** elunlen


Out of control of file descriptor in LOG service cause many .nfs file generate. 
The problem can be reproduced by following steps:
(1) Make a sample to writing log to one log stream.
(2) Execute sample until saLogStreamNumOpeners > 100
(3) Check the applog folder, some .nfs files exist in applog folder, not 
frequently happen

Base on my LOG trace, the problem may cause by closing file unsuccessfully but 
no checking return code. Here is the log:

Aug 21  6:10:35.015781 osaflogd [4384:lgs_file.c:0287] TR log_file_api - 
LGSF_BUSY
Aug 21  6:10:35.015786 osaflogd [4384:lgs_stream.c:1233] TR log_stream_write_h 
- API error LGSF_BUSY
Aug 21  6:10:35.015794 osaflogd [4384:lgs_stream.c:1247] IN write 
'test_20140821_061033' failed "Success"
Aug 21  6:10:35.015799 osaflogd [4384:lgs_stream.c:0108] >> fileclose_h: fd=27
Aug 21  6:10:35.015902 osaflogd [4384:lgs_file.c:0287] TR log_file_api - 
LGSF_BUSY
Aug 21  6:10:35.015914 osaflogd [4384:lgs_stream.c:0119] TR fileclose_h - API 
error LGSF_BUSY
Aug 21  6:10:35.015919 osaflogd [4384:lgs_stream.c:0125] << fileclose_h: rc = -1
Aug 21  6:10:35.015924 osaflogd [4384:lgs_stream.c:1275] << log_stream_write_h: 
rc=-1
…
Aug 21  6:10:35.016280 osaflogd [4384:lgs_stream.c:1188] >> log_stream_write_h: 
safLgStr=testLogStream1,safApp=safLogService
Aug 21  6:10:35.016285 osaflogd [4384:lgs_stream.c:0293] >> 
log_initiate_stream_files

Here is the code of log_stream_write_h() function:

        if (*stream->p_fd == -1) {
                /* Create directory and log files if they were not created at
                 * stream open or reopen files if bad file descriptor.
                 */
                log_initiate_stream_files(stream);
        …
        if ((rc == -1) || (rc == -2)) {
                /* If writing failed always invalidate the stream file 
descriptor.
                 */
                /* Careful with log level here to avoid syslog flooding */
                LOG_IN("write '%s' failed \"%s\"", stream->logFileCurrent,
                                strerror(write_errno));

                if (*stream->p_fd != -1) {
                        /* Try to close the file and invalidate the stream fd */
                        fileclose_h(*stream->p_fd); //===>No check return code
                        *stream->p_fd = -1;
                }
        …

So, there is one file descriptor is opened out of control of LOG service.
If LOG service open log file with new fd, when log file rotate mechanism delete 
it, .nfs file generated.

I also found .nfs file with size < max size. Because file rename done, then 
open new log file time out, actually success, but variable "curFileSize" is not 
reset. In next write, log file is open again (2nd fd), will close after 1st 
write due to "curFileSize" > max file size. This small log file will become 
.nfs when rotation mechanism delete it. So, I think the variable "curFileSize" 
should reset 0 after rename succeed and consider about open file api. Here is 
the code:

                /* Close current log file */
                if ((rc = fileclose_h(*stream->p_fd)) == -1) {
                        LOG_IN("close FAILED: %s", strerror(errno));
                        goto done;
                }
                *stream->p_fd = -1;
                /* Rename file to give it the "close timestamp" */
                rc = lgs_file_rename_h(stream->pathName, stream->logFileCurrent,
                                current_time, LGS_LOG_FILE_EXT, NULL);
                if (rc == -1)
                        goto done;

                /* Create a new file name that includes "open time stamp" and 
open the file */
                snprintf(stream->logFileCurrent, NAME_MAX, "%s_%s", 
stream->fileName,
                                current_time);
                if ((*stream->p_fd = log_file_open(stream, 
stream->logFileCurrent, //==>open time out but actually file opened cause fd 
out of control
                                &errno_save)) == -1) {
                        LOG_IN("Could not open '%s' - %s", 
stream->logFileCurrent,
                                        strerror(errno_save));
                        rc = -1;
                        goto done;
                }
                stream->curFileSize = 0; //==>Reset size here should move to 
after rename done

Here is the log:

Aug 22  7:43:05.038105 osaflogd [4110:lgs_stream.c:0740] >> log_file_open 
Aug 22  7:43:05.038111 osaflogd [4110:lgs_stream.c:0750] TR log_file_open - 
Opening file 
"/storage/no-backup/xxx/var/log/saflog/test/test_20140822_074304.log"
Aug 22  7:43:05.038116 osaflogd [4110:lgs_stream.c:0063] >> fileopen_h 
Aug 22  7:43:05.038121 osaflogd [4110:lgs_stream.c:0074] TR fileopen_h - 
filepath "/storage/no-backup/xxx/var/log/saflog/test/test_20140822_074304.log"
Aug 22  7:43:05.038393 osaflogd [4110:lgs_filehdl.c:0448] >> fileopen_hdl 
Aug 22  7:43:05.038417 osaflogd [4110:lgs_filehdl.c:0450] TR fileopen_hdl - 
filepath "/storage/no-backup/xxx/var/log/saflog/test/test_20140822_074304.log"
…
Aug 22  7:43:05.539345 osaflogd [4110:lgs_file.c:0342] TR Timed out before 
answer
Aug 22  7:43:05.558948 osaflogd [4110:lgs_stream.c:0085] TR fileopen_h - API 
error LGSF_TIMEOUT
Aug 22  7:43:05.558987 osaflogd [4110:lgs_stream.c:0092] << fileopen_h 
Aug 22  7:43:05.559000 osaflogd [4110:lgs_stream.c:0757] << log_file_open 
Aug 22  7:43:05.559022 osaflogd [4110:lgs_stream.c:1151] IN Could not open 
'test_20140822_074304' - Success
Aug 22  7:43:05.559035 osaflogd [4110:lgs_stream.c:1275] << log_stream_write_h: 
rc=-1

**So, there are 3 points need to solve:**
(1) Closing file unsuccessfully but no checking return code.
(2) The current file size should reset after rename done.
(3) Open file time out but file already open cause fd out of control


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to