Attached is a test file to test various modes of imfile, the results of the test, and a consolodated patch going back to f7c20920046ebcb94eadadf1ebad97b634a12a2d (your merge of the other imfile fixes)

It turned out that there were a lot of problems in my code, so there's not much left of my initial patch. This is why I didn't just send in an update to the prior patch

It would be good to have the config changes broken out as a separate patch (as much for future educational benifits as anything else). I will try and figure out how I can manipulate git to do this.

One other problem I had is that I couldn't make the extra parameter work with ReadMultiLine, it would segfault every time, since nothing else uses ReadLine, I ended up adding the paramter to Readline and eliminating ReadMultiLine completely

One thing that surprised me is that it doesn't appear that control characters in imfile are escaped the way that network received logs are escaped. Did I miss some way to enable this? Initially I thought that possibly only \n wasn't escaped, but some of my mistakes generated other control characters.

David Lang


On Thu, 13 Jan 2011, [email protected] wrote:

Date: Thu, 13 Jan 2011 02:02:51 -0800 (PST)
From: [email protected]
Reply-To: rsyslog-users <[email protected]>
To: Rainer Gerhards <[email protected]>
Cc: rsyslog-users <[email protected]>
Subject: Re: [rsyslog] imfile paragraph patch

thanks. I will try to go over both of these tomorrow, but will definantly do so no later than this weekend.

David Lang

On Thu, 13 Jan 2011, Rainer Gerhards wrote:

Date: Thu, 13 Jan 2011 10:43:37 +0100
From: Rainer Gerhards <[email protected]>
To: [email protected], rsyslog-users <[email protected]>
Subject: Re: imfile paragraph patch

I have now also created a new branch for this patch:
 v5-devel-david-imfile

I added the config variable. See the commit log for useful information and steps. While I was a bit hesitant to merge this patch soon to the official branch (due to the problems I had with imfile), I begin to think this is over-cautious. It should really not harm any existing code. So please let me know when you have finished your testing of the new code, I'll probably merge soon then.

Thanks!
Rainer

On 12/14/2010 04:57 AM, [email protected] wrote:
I discovered UnreadChar and so now mode 2 (indented follow-up lines) has
a chance of working. again compile tested (and visually code-reviewed by
someone else), but not executed.

David Lang

On Mon, 13 Dec 2010, [email protected] wrote:

This is a first cut of a modification to imfile to let it read
multi-line files.

As-is, this should have no effect on a system as it hard-codes the
mode to reading single lines (I really don't understand how to set a
config variable, but for someone who does, it should be simple to
replace the '0' in imfile.c with the value of the config file)

With this config option change, it should be possible to real logfiles
that have blank lines between multi-line log entries and have those
log entries treated as a single line.

I also have code in place (but disabled) to try and deal with the more
complicated layout where all lines after the first one are indented if
they are part of the same log entry. The problem I have is that when I
discover that I have finished reading a log entry I have already read
the first character of the next log entry. This extra character needs
to be put pack into the input buffer, but I don't know if that is
possible or not. If this isn't the case, I need a function that will
let me peek at the next character in the input buffer and make my
decision based on that.

This compiles, but I have not tested it anywhere yet. with the
hardcoded mode 0 for ('LF termination), there should be no change
other than an extra test against a constant for each character read
from a file.

David Lang


_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Jan 17 09:55:19 dlang-laptop kernel: imklog 5.7.2, log source = /proc/kmsg 
started.
Jan 17 09:55:19 dlang-laptop rsyslogd: [origin software="rsyslogd" 
swVersion="5.7.2" x-pid="14059" x-info="http://www.rsyslog.com";] start
Jan 17 09:55:19 dlang-laptop testfile: message1-1
Jan 17 09:55:19 dlang-laptop testfile: message1-2
Jan 17 09:55:19 dlang-laptop testfile: message1-3
Jan 17 09:55:19 dlang-laptop testfile: message1-4
Jan 17 09:55:19 dlang-laptop testfile: message2-1
Jan 17 09:55:19 dlang-laptop testfile: message2-2
Jan 17 09:55:19 dlang-laptop testfile: message2-3
Jan 17 09:55:19 dlang-laptop testfile: message2-4
Jan 17 09:55:19 dlang-laptop testfile: message3-1
Jan 17 09:55:19 dlang-laptop testfile: message3-2
Jan 17 09:55:19 dlang-laptop testfile: message3-3
Jan 17 09:55:19 dlang-laptop testfile: message3-4
Jan 17 09:55:23 dlang-laptop kernel: Kernel logging (proc) stopped.
Jan 17 09:55:23 dlang-laptop rsyslogd: [origin software="rsyslogd" 
swVersion="5.7.2" x-pid="14059" x-info="http://www.rsyslog.com";] exiting on 
signal 2.
Jan 17 09:56:16 dlang-laptop kernel: imklog 5.7.2, log source = /proc/kmsg 
started.
Jan 17 09:56:16 dlang-laptop rsyslogd: [origin software="rsyslogd" 
swVersion="5.7.2" x-pid="14068" x-info="http://www.rsyslog.com";] start
Jan 17 09:56:16 dlang-laptop testfile: message1-1
message1-2
 message1-3
 message1-4
Jan 17 09:56:16 dlang-laptop testfile: message2-1
message2-2
 message2-3
 message2-4
Jan 17 09:56:16 dlang-laptop testfile: message3-1
message3-2
 message3-3
 message3-4
Jan 17 09:56:19 dlang-laptop kernel: Kernel logging (proc) stopped.
Jan 17 09:56:19 dlang-laptop rsyslogd: [origin software="rsyslogd" 
swVersion="5.7.2" x-pid="14068" x-info="http://www.rsyslog.com";] exiting on 
signal 2.
Jan 17 09:56:44 dlang-laptop kernel: imklog 5.7.2, log source = /proc/kmsg 
started.
Jan 17 09:56:44 dlang-laptop rsyslogd: [origin software="rsyslogd" 
swVersion="5.7.2" x-pid="14076" x-info="http://www.rsyslog.com";] start
Jan 17 09:56:44 dlang-laptop testfile: message1-1
Jan 17 09:56:44 dlang-laptop testfile: message1-2
 message1-3
 message1-4
Jan 17 09:56:44 dlang-laptop testfile: message2-1
Jan 17 09:56:44 dlang-laptop testfile: message2-2
 message2-3
 message2-4
Jan 17 09:56:44 dlang-laptop testfile: message3-1
Jan 17 09:56:44 dlang-laptop testfile: message3-2
 message3-3
 message3-4
Jan 17 09:56:45 dlang-laptop kernel: Kernel logging (proc) stopped.
Jan 17 09:56:45 dlang-laptop rsyslogd: [origin software="rsyslogd" 
swVersion="5.7.2" x-pid="14076" x-info="http://www.rsyslog.com";] exiting on 
signal 2.
diff --git a/ChangeLog b/ChangeLog
index 18eb97b..f4084f3 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,6 @@
 ---------------------------------------------------------------------------
 Version 5.7.3  [V5-DEVEL] (rgerhards), 2010-12-??
+- added support for processing multi-line messages in imfile
 - added $IMUDPSchedulingPolicy and $IMUDPSchedulingPriority config settings
 - added $LocalHostName config directive
 - bugfix: fixed build problems on some platforms
diff --git a/doc/imfile.html b/doc/imfile.html
index f6b140a..66c13e0 100644
--- a/doc/imfile.html
+++ b/doc/imfile.html
@@ -96,6 +96,11 @@ been processed. This setting can be used to guard against message duplication du
 to fatal errors (like power fail). Note that this setting affects imfile
 performance, especially when set to a low value. Frequently writing the state
 file is very time consuming.
+<li><b>$InputFileReadMode</b> [mode]</b><br>
+Available in 5.7.3+
+<br>
+Mode to be used when reading lines. 0 (the default) means that each line is forwarded
+as its own log message.
 </ul>
 <b>Caveats/Known Bugs:</b>
 <p>So far, only 100 files can be monitored. If more are needed,
diff --git a/plugins/imfile/imfile.c b/plugins/imfile/imfile.c
index 8681ac8..c205f60 100644
--- a/plugins/imfile/imfile.c
+++ b/plugins/imfile/imfile.c
@@ -71,6 +71,7 @@ typedef struct fileInfo_s {
 	int nRecords; /**< How many records did we process before persisting the stream? */
 	int iPersistStateInterval; /**< how often should state be persisted? (0=on close only) */
 	strm_t *pStrm;	/* its stream (NULL if not assigned) */
+	int readMode;	/* which mode to use in ReadMulteLine call? */
 } fileInfo_t;
 
 
@@ -85,6 +86,7 @@ static int iPollInterval = 10;	/* number of seconds to sleep when there was no f
 static int iPersistStateInterval = 0;	/* how often if state file to be persisted? (default 0->never) */
 static int iFacility = 128; /* local0 */
 static int iSeverity = 5;  /* notice, as of rfc 3164 */
+static int readMode = 0;  /* mode to use for ReadMultiLine call */
 
 static int iFilPtr = 0;		/* number of files to be monitored; pointer to next free spot during config */
 #define MAX_INPUT_FILES 100
@@ -212,7 +214,7 @@ static rsRetVal pollFile(fileInfo_t *pThis, int *pbHadFileData)
 
 	/* loop below will be exited when strmReadLine() returns EOF */
 	while(1) {
-		CHKiRet(strm.ReadLine(pThis->pStrm, &pCStr));
+		CHKiRet(strm.ReadLine(pThis->pStrm, &pCStr, pThis->readMode));
 		*pbHadFileData = 1; /* this is just a flag, so set it and forget it */
 		CHKiRet(enqLine(pThis, pCStr)); /* process line */
 		rsCStrDestruct(&pCStr); /* discard string (must be done by us!) */
@@ -447,6 +449,7 @@ static rsRetVal resetConfigVariables(uchar __attribute__((unused)) *pp, void __a
 	iPollInterval = 10;
 	iFacility = 128; /* local0 */
 	iSeverity = 5;  /* notice, as of rfc 3164 */
+	readMode = 0;
 
 	RETiRet;
 }
@@ -489,6 +492,7 @@ static rsRetVal addMonitor(void __attribute__((unused)) *pVal, uchar *pNewVal)
 		pThis->iFacility = iFacility;
 		pThis->iPersistStateInterval = iPersistStateInterval;
 		pThis->nRecords = 0;
+		pThis->readMode = readMode;
 		iPersistStateInterval = 0;
 	} else {
 		errmsg.LogError(0, RS_RET_OUT_OF_DESRIPTORS, "Too many file monitors configured - ignoring this one");
@@ -535,6 +539,8 @@ CODEmodInit_QueryRegCFSLineHdlr
 	  	NULL, &iFacility, STD_LOADABLE_MODULE_ID));
 	CHKiRet(omsdRegCFSLineHdlr((uchar *)"inputfilepollinterval", 0, eCmdHdlrInt,
 	  	NULL, &iPollInterval, STD_LOADABLE_MODULE_ID));
+	CHKiRet(omsdRegCFSLineHdlr((uchar *)"inputfilereadmode", 0, eCmdHdlrInt,
+	  	NULL, &readMode, STD_LOADABLE_MODULE_ID));
 	CHKiRet(omsdRegCFSLineHdlr((uchar *)"inputfilepersiststateinterval", 0, eCmdHdlrInt,
 	  	NULL, &iPersistStateInterval, STD_LOADABLE_MODULE_ID));
 	/* that command ads a new file! */
diff --git a/runtime/stream.c b/runtime/stream.c
index 658aba1..16d41a2 100644
--- a/runtime/stream.c
+++ b/runtime/stream.c
@@ -561,39 +561,98 @@ static rsRetVal strmUnreadChar(strm_t *pThis, uchar c)
 	return RS_RET_OK;
 }
 
-
-/* read a line from a strm file. A line is terminated by LF. The LF is read, but it
- * is not returned in the buffer (it is discared). The caller is responsible for
- * destruction of the returned CStr object! -- rgerhards, 2008-01-07
- * rgerhards, 2008-03-27: I now use the ppCStr directly, without any interim
- * string pointer. The reason is that this function my be called by inputs, which
- * are pthread_killed() upon termination. So if we use their native pointer, they
- * can cleanup (but only then).
+/* read a 'paragraph' from a strm file.
+ * A paragraph may be terminated by a LF, by a LFLF, or by LF<not whitespace> depending on the option set.
+ * The termination LF characters are read, but are
+ * not returned in the buffer (it is discared). The caller is responsible for
+ * destruction of the returned CStr object! -- dlang 2010-12-13
  */
 static rsRetVal
-strmReadLine(strm_t *pThis, cstr_t **ppCStr)
+strmReadLine(strm_t *pThis, cstr_t **ppCStr, int mode)
 {
-	DEFiRet;
-	uchar c;
-
-	ASSERT(pThis != NULL);
-	ASSERT(ppCStr != NULL);
-
-	CHKiRet(cstrConstruct(ppCStr));
-
-	/* now read the line */
-	CHKiRet(strmReadChar(pThis, &c));
-	while(c != '\n') {
-		CHKiRet(cstrAppendChar(*ppCStr, c));
-		CHKiRet(strmReadChar(pThis, &c));
+	/* mode = 0 single line mode (equivalent to ReadLine)
+         * mode = 1 LFLF mode (paragraph, blank line between entries)
+         * mode = 2 LF <not whitespace> mode, a log line starts at the beginning of a line, but following lines that are indented are part of the same log entry
+	 *  This modal interface is not nearly as flexible as being able to define a regex for when a new record starts, but it's also not nearly as hard (or as slow) to implement
+         */
+        DEFiRet;
+        uchar c;
+	uchar finished;
+
+        ASSERT(pThis != NULL);
+        ASSERT(ppCStr != NULL);
+
+        CHKiRet(cstrConstruct(ppCStr));
+
+        /* now read the line */
+        CHKiRet(strmReadChar(pThis, &c));
+        if (mode == 0){
+        	while(c != '\n') {
+                	CHKiRet(cstrAppendChar(*ppCStr, c));
+                	CHKiRet(strmReadChar(pThis, &c));
+        	}
+        	CHKiRet(cstrFinalize(*ppCStr));
+	}
+        if (mode == 1){
+		finished=0;
+		while(finished == 0){
+        		if(c != '\n') {
+                		CHKiRet(cstrAppendChar(*ppCStr, c));
+                		CHKiRet(strmReadChar(pThis, &c));
+			} else {
+				if ((((*ppCStr)->iStrLen) > 0) ){
+					if ((*ppCStr)->pBuf[(*ppCStr)->iStrLen -1 ] == '\n'){
+						rsCStrTruncate(*ppCStr,1); /* remove the prior newline */
+						finished=1;
+					} else {
+               					CHKiRet(cstrAppendChar(*ppCStr, c));
+               					CHKiRet(strmReadChar(pThis, &c));
+					}
+				} else {
+					finished=1;  /* this is a blank line, a \n with nothing since the last complete record */
+				}
+			}
+		}
+        	CHKiRet(cstrFinalize(*ppCStr));
+	}
+        if (mode == 2){
+/* indented follow-up lines */
+		finished=0;
+		while(finished == 0){
+			if ((*ppCStr)->iStrLen == 0){
+        			if(c != '\n') {
+/* nothing in the buffer, and it's not a newline, add it to the buffer */
+               				CHKiRet(cstrAppendChar(*ppCStr, c));
+               				CHKiRet(strmReadChar(pThis, &c));
+				} else {
+					finished=1;  /* this is a blank line, a \n with nothing since the last complete record */
+				}
+			} else {
+				if ((*ppCStr)->pBuf[(*ppCStr)->iStrLen -1 ] != '\n'){
+/* not the first character after a newline, add it to the buffer */
+               				CHKiRet(cstrAppendChar(*ppCStr, c));
+               				CHKiRet(strmReadChar(pThis, &c));
+				} else {
+					if ((c == ' ') || (c == '\t')){
+               					CHKiRet(cstrAppendChar(*ppCStr, c));
+               					CHKiRet(strmReadChar(pThis, &c));
+					} else {
+/* clean things up by putting the character we just read back into the input buffer and removing the LF character that is currently at the end of the output string */
+						CHKiRet(strmUnreadChar(pThis, c));
+						rsCStrTruncate(*ppCStr,1);
+						finished=1;
+					}
+				}
+			}
+		}
+       		CHKiRet(cstrFinalize(*ppCStr));
 	}
-	CHKiRet(cstrFinalize(*ppCStr));
 
 finalize_it:
-	if(iRet != RS_RET_OK && *ppCStr != NULL)
-		cstrDestruct(ppCStr);
+        if(iRet != RS_RET_OK && *ppCStr != NULL)
+                cstrDestruct(ppCStr);
 
-	RETiRet;
+        RETiRet;
 }
 
 
diff --git a/runtime/stream.h b/runtime/stream.h
index 37e9d57..60c68cb 100644
--- a/runtime/stream.h
+++ b/runtime/stream.h
@@ -156,7 +156,6 @@ BEGINinterface(strm) /* name must also be changed in ENDinterface macro! */
 	rsRetVal (*SetFileName)(strm_t *pThis, uchar *pszName, size_t iLenName);
 	rsRetVal (*ReadChar)(strm_t *pThis, uchar *pC);
 	rsRetVal (*UnreadChar)(strm_t *pThis, uchar c);
-	rsRetVal (*ReadLine)(strm_t *pThis, cstr_t **ppCStr);
 	rsRetVal (*SeekCurrOffs)(strm_t *pThis);
 	rsRetVal (*Write)(strm_t *pThis, uchar *pBuf, size_t lenBuf);
 	rsRetVal (*WriteChar)(strm_t *pThis, uchar c);
@@ -183,8 +182,10 @@ BEGINinterface(strm) /* name must also be changed in ENDinterface macro! */
 	INTERFACEpropSetMeth(strm, iSizeLimit, off_t);
 	INTERFACEpropSetMeth(strm, iFlushInterval, int);
 	INTERFACEpropSetMeth(strm, pszSizeLimitCmd, uchar*);
+	/* v6 added */
+	rsRetVal (*ReadLine)(strm_t *pThis, cstr_t **ppCStr, int mode);
 ENDinterface(strm)
-#define strmCURR_IF_VERSION 5 /* increment whenever you change the interface structure! */
+#define strmCURR_IF_VERSION 6 /* increment whenever you change the interface structure! */
 
 
 /* prototypes */
message1-1
message1-2
 message1-3
 message1-4

message2-1
message2-2
 message2-3
 message2-4

message3-1
message3-2
 message3-3
 message3-4


_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to