I've done with some performance comparison of mmnormalize based on old (based on libee with conversion to json) and new (pure json) liblognorm.

Attached are two patches to mmnormalize, first one to match new liblognorm, second is just a feature, allowing to store normalized fields under arbitrary JSON subtree. Second patch does not affect performance noticeably, though it allows to avoid merging of JSON objects.

Hope someone will give it a try while Rainer is busy.


1. Test setup:

Sending copy of 50000 real syslog packets with tcpreplay via udp. Rsyslog is configured to receive UDP, normalize it and write to a file.

Ruleset(name="logd"
    queue.type="FixedArray"
    queue.size="200000"
    queue.workerthreads="2"
    queue.timeoutenqueue="0"
    queue.dequeuebatchsize="1024"
    ){

        Action(
            type="mmnormalize"
            userawmsg="on"
            rulebase="/usr/local/etc/rsyslog.d.norm/rules.rb"
        )
       Action(
            type="omfile"
            DynaFile="rawFilename"
            template="rawTemplate"
        )

}

CPU load on the target host with different versions of mmnormalize:

No mmnormalize, 12000rps

00:18:19 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 00:18:22 all 0,51 0,00 0,00 0,00 0,00 0,00 0,00 0,00 99,49 00:18:23 all 7,41 0,00 5,29 0,00 0,00 1,06 0,00 0,00 86,24 00:18:24 all 13,04 0,00 9,24 0,00 0,00 1,09 0,00 0,00 76,63 00:18:25 all 12,97 0,00 10,27 0,00 0,00 2,16 0,00 0,00 74,59 00:18:26 all 10,98 0,00 8,09 0,00 0,00 0,00 0,00 0,00 80,92 00:18:27 all 11,49 0,00 7,47 0,00 0,00 0,57 0,00 0,00 80,46 00:18:28 all 3,61 0,00 1,55 0,00 0,00 0,52 0,00 0,00 94,33 00:18:29 all 0,50 0,00 2,51 1,01 0,00 0,00 0,00 0,00 95,98

Stock mmnormalize, 12000rps

00:18:42 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 00:18:43 all 0,99 0,00 0,50 0,00 0,00 0,00 0,00 0,00 98,51 00:18:44 all 34,41 0,00 3,23 0,00 0,00 2,15 0,00 0,00 60,22 00:18:45 all 44,92 0,00 7,49 0,00 0,00 1,60 0,00 0,00 45,99 00:18:46 all 49,72 0,00 5,08 0,00 0,00 1,13 0,00 0,00 44,07 00:18:47 all 45,30 0,00 4,97 0,00 0,00 3,31 0,00 0,00 46,41 00:18:48 all 45,36 0,00 5,46 0,55 0,00 3,83 0,00 0,00 44,81 00:18:49 all 3,05 0,00 1,52 0,00 0,00 0,00 0,00 0,00 95,43 00:18:50 all 0,50 0,00 0,00 0,00 0,00 0,00 0,00 0,00 99,50

New mmnormalize, 12000rps

00:19:08 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 00:19:09 all 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 00:19:10 all 10,77 0,00 2,56 0,51 0,00 1,03 0,00 0,00 85,13 00:19:11 all 23,78 0,00 7,03 0,00 0,00 0,54 0,00 0,00 68,65 00:19:12 all 26,92 0,00 4,40 0,00 0,00 1,65 0,00 0,00 67,03 00:19:13 all 23,24 0,00 8,11 0,00 0,00 2,16 0,00 0,00 66,49 00:19:14 all 27,27 0,00 5,88 0,00 0,00 1,07 0,00 0,00 65,78 00:19:15 all 16,67 0,00 4,30 0,00 0,00 0,54 0,00 0,00 78,49 00:19:16 all 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00


2. Statistics from a real server, about 95000rps, storing normalized values to Redis:

With old module:

03:20:01 PM     CPU     %user     %nice   %system   %iowait %steal     %idle
03:30:01 PM     all     34.33      0.00      3.44      0.00 0.00     62.23
03:40:01 PM     all     34.47      0.00      3.41      0.00 0.00     62.12
03:50:01 PM     all     34.24      0.00      3.47      0.00 0.00     62.28
04:00:01 PM     all     34.36      0.00      3.36      0.00 0.00     62.28
04:10:01 PM     all     34.60      0.00      3.42      0.00 0.00     61.98
04:20:01 PM     all     34.62      0.00      3.46      0.03 0.00     61.90
04:30:01 PM     all     34.71      0.00      3.40      0.00 0.00     61.89
04:40:01 PM     all     34.92      0.00      3.42      0.00 0.00     61.66
04:50:01 PM     all     34.93      0.00      3.40      0.00 0.00     61.67
05:00:01 PM     all     34.97      0.00      3.39      0.00 0.00     61.63
05:10:01 PM     all     34.85      0.00      3.37      0.00 0.00     61.78
05:20:01 PM     all     35.00      0.00      3.43      0.03 0.00     61.54
05:30:01 PM     all     34.85      0.00      3.38      0.00 0.00     61.77

Then, after switch to new version:

05:40:01 PM     all     30.05      0.00      3.22      0.00 0.00     66.73
05:50:01 PM     all     22.71      0.00      3.12      0.00 0.00     74.16
06:00:01 PM     all     24.21      0.00      3.84      0.00 0.00     71.95
06:10:01 PM     all     25.00      0.00      4.05      0.00 0.00     70.95
06:20:01 PM     all     25.33      0.00      4.00      0.04 0.00     70.63
06:30:01 PM     all     25.16      0.00      3.82      0.00 0.00     71.02


3. And one more comparison. In this test, I send 500000 messages from tcpreplay, at about 15000rps.

a) My implementation of re_extract_hash (extracts 7 submatches at once with a straightforward regexp):

# time /usr/local/rsyslog-regex/sbin/rsyslogd -f /usr/local/etc/rsyslog.re.conf -n

real    0m52.248s
user    0m45.187s
sys     0m7.456s

There was some backlog, that's why real time is greater than in other cases.

b) Old mmnormalize (just one rule, extracts the same 7 submatches)

# time /usr/local/rsyslog-stock/sbin/rsyslogd -f /usr/local/etc/rsyslog.conf.norm -n

real    0m40.663s
user    0m36.570s
sys     0m5.976s

c) New mmnormalize (the same rulebase):

# time /usr/local/rsyslog-path/sbin/rsyslogd -f /usr/local/etc/rsyslog.conf.norm -n

real    0m40.054s
user    0m30.658s
sys     0m5.272s

This clearly shows that regexps should be avoided, where possible, if CPU load is a concern.


--
Pavel Levshin


30.10.2013 23:50, Pavel Levshin:

So, I have taken the opportunity and refactored liblognorm to use json-c instead of libee. Some parts of libee now present in liblognorm, notably field parsers and encoders. They were rewritten to get rid of libee data structures. In the same time, many bugs were fixed, and many were undoubtedly produced.

Current state of the library can be seen here:

https://github.com/flicker581/liblognorm/tree/master-json-c

It is work in progress, though. Lognormalizer works fine, but mmnormalize has not been updated yet. New version is somewhat slower than older versions used to be. In my tests it was ~40% slower. This slowdown is attributable to more complex memory management due to bigger allocations by json-c. Still, it should be much faster than mmnormalize with older liblognorm.

Comments are greatly welcome.


>From 77afbae6ee97391a7c3f72f091ef8ee4ebbfdcb3 Mon Sep 17 00:00:00 2001
From: Pavel Levshin <[email protected]>
Date: Thu, 31 Oct 2013 22:11:24 +0400
Subject: [PATCH 1/2] rewrite mmnormalize to use liblognorm_j

---
 configure.ac                      |    3 +--
 plugins/mmnormalize/Makefile.am   |    4 ++--
 plugins/mmnormalize/mmnormalize.c |   31 ++-----------------------------
 3 files changed, 5 insertions(+), 33 deletions(-)

diff --git a/configure.ac b/configure.ac
index 3abd559..bf9d34c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -931,8 +931,7 @@ AC_ARG_ENABLE(mmnormalize,
         [enable_mmnormalize=no]
 )
 if test "x$enable_mmnormalize" = "xyes"; then
-       PKG_CHECK_MODULES(LIBEE, libee >= 0.4.0)
-       PKG_CHECK_MODULES(LIBLOGNORM, lognorm >= 0.3.1)
+       PKG_CHECK_MODULES(LIBLOGNORM_J, lognorm_j >= 0.4.0)
 fi
 AM_CONDITIONAL(ENABLE_MMNORMALIZE, test x$enable_mmnormalize = xyes)
 
diff --git a/plugins/mmnormalize/Makefile.am b/plugins/mmnormalize/Makefile.am
index 0a3b5ba..c3b5083 100644
--- a/plugins/mmnormalize/Makefile.am
+++ b/plugins/mmnormalize/Makefile.am
@@ -1,8 +1,8 @@
 pkglib_LTLIBRARIES = mmnormalize.la
 
 mmnormalize_la_SOURCES = mmnormalize.c
-mmnormalize_la_CPPFLAGS =  $(RSRT_CFLAGS) $(PTHREADS_CFLAGS) 
$(LIBLOGNORM_CFLAGS) $(LIBEE_CFLAGS)
-mmnormalize_la_LDFLAGS = -module -avoid-version $(LIBLOGNORM_LIBS) 
$(LIBEE_LIBS)
+mmnormalize_la_CPPFLAGS =  $(RSRT_CFLAGS) $(PTHREADS_CFLAGS) 
$(LIBLOGNORM_J_CFLAGS)
+mmnormalize_la_LDFLAGS = -module -avoid-version $(LIBLOGNORM_J_LIBS)
 mmnormalize_la_LIBADD = 
 
 EXTRA_DIST = 
diff --git a/plugins/mmnormalize/mmnormalize.c 
b/plugins/mmnormalize/mmnormalize.c
index 7e25824..71baef4 100644
--- a/plugins/mmnormalize/mmnormalize.c
+++ b/plugins/mmnormalize/mmnormalize.c
@@ -39,7 +39,6 @@
 #include <errno.h>
 #include <unistd.h>
 #include <libestr.h>
-#include <libee/libee.h>
 #include <json.h>
 #include <liblognorm.h>
 #include "conf.h"
@@ -67,7 +66,6 @@ typedef struct _instanceData {
        sbool bUseRawMsg;       /**< use %rawmsg% instead of %msg% */
        uchar   *rulebase;      /**< name of rulebase to use */
        ln_ctx ctxln;           /**< context to be used for liblognorm */
-       ee_ctx ctxee;           /**< context to be used for libee */
 } instanceData;
 
 typedef struct configSettings_s {
@@ -103,23 +101,14 @@ static rsRetVal
 buildInstance(instanceData *pData)
 {
        DEFiRet;
-       if((pData->ctxee = ee_initCtx()) == NULL) {
-               errmsg.LogError(0, RS_RET_ERR_LIBEE_INIT, "error: could not 
initialize libee "
-                               "ctx, cannot activate action");
-               ABORT_FINALIZE(RS_RET_ERR_LIBEE_INIT);
-       }
-
        if((pData->ctxln = ln_initCtx()) == NULL) {
                errmsg.LogError(0, RS_RET_ERR_LIBLOGNORM_INIT, "error: could 
not initialize "
                                "liblognorm ctx, cannot activate action");
-               ee_exitCtx(pData->ctxee);
                ABORT_FINALIZE(RS_RET_ERR_LIBLOGNORM_INIT);
        }
-       ln_setEECtx(pData->ctxln, pData->ctxee);
        if(ln_loadSamples(pData->ctxln, (char*) pData->rulebase) != 0) {
                errmsg.LogError(0, RS_RET_NO_RULEBASE, "error: normalization 
rulebase '%s' "
                                "could not be loaded cannot activate action", 
cs.rulebase);
-               ee_exitCtx(pData->ctxee);
                ln_exitCtx(pData->ctxln);
                ABORT_FINALIZE(RS_RET_ERR_LIBLOGNORM_SAMPDB_LOAD);
        }
@@ -176,7 +165,6 @@ ENDisCompatibleWithFeature
 BEGINfreeInstance
 CODESTARTfreeInstance
        free(pData->rulebase);
-       ee_exitCtx(pData->ctxee);
        ln_exitCtx(pData->ctxln);
 ENDfreeInstance
 
@@ -195,12 +183,9 @@ BEGINdoAction
        msg_t *pMsg;
        es_str_t *str;
        uchar *buf;
-       char *cstrJSON;
        int len;
        int r;
-       struct ee_event *event = NULL;
-       struct json_tokener *tokener;
-       struct json_object *json;
+       struct json_object *json = NULL;
 CODESTARTdoAction
        pMsg = (msg_t*) ppString[0];
        /* note that we can performance-optimize the interface, but this also
@@ -214,7 +199,7 @@ CODESTARTdoAction
                len = getMSGLen(pMsg);
        }
        str = es_newStrFromCStr((char*)buf, len);
-       r = ln_normalize(pData->ctxln, str, &event);
+       r = ln_normalize(pData->ctxln, str, &json);
        if(r != 0) {
                DBGPRINTF("error %d during ln_normalize\n", r);
                MsgSetParseSuccess(pMsg, 0);
@@ -223,20 +208,8 @@ CODESTARTdoAction
        }
        es_deleteStr(str);
 
-       /* reformat to our json data struct */
-       /* TODO: this is all extremly ineffcient! */
-       ee_fmtEventToJSON(event, &str);
-       cstrJSON = es_str2cstr(str, NULL);
-       ee_deleteEvent(event);
-       dbgprintf("mmnormalize generated: %s\n", cstrJSON);
-
-       tokener = json_tokener_new();
-       json = json_tokener_parse_ex(tokener, cstrJSON, 
strlen((char*)cstrJSON));
-       json_tokener_free(tokener);
        msgAddJSON(pMsg, (uchar*)"!", json);
 
-       free(cstrJSON);
-       es_deleteStr(str);
 ENDdoAction
 
 
-- 
1.7.9.5

>From 9f857a8febab2a45f34715d9a47c7e05fede02a7 Mon Sep 17 00:00:00 2001
From: Pavel Levshin <[email protected]>
Date: Sat, 2 Nov 2013 15:35:45 +0400
Subject: [PATCH 2/2] Allow path parameter to place normalized fields as
 subtree

---
 plugins/mmnormalize/mmnormalize.c |   23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/plugins/mmnormalize/mmnormalize.c 
b/plugins/mmnormalize/mmnormalize.c
index 71baef4..068851b 100644
--- a/plugins/mmnormalize/mmnormalize.c
+++ b/plugins/mmnormalize/mmnormalize.c
@@ -66,6 +66,7 @@ typedef struct _instanceData {
        sbool bUseRawMsg;       /**< use %rawmsg% instead of %msg% */
        uchar   *rulebase;      /**< name of rulebase to use */
        ln_ctx ctxln;           /**< context to be used for liblognorm */
+       char *pszPath;      /**< path of normalized data */
 } instanceData;
 
 typedef struct configSettings_s {
@@ -78,6 +79,7 @@ static configSettings_t cs;
 /* action (instance) parameters */
 static struct cnfparamdescr actpdescr[] = {
        { "rulebase", eCmdHdlrGetWord, 1 },
+       { "path", eCmdHdlrGetWord, 0 },
        { "userawmsg", eCmdHdlrBinary, 0 }
 };
 static struct cnfparamblk actpblk =
@@ -166,6 +168,7 @@ BEGINfreeInstance
 CODESTARTfreeInstance
        free(pData->rulebase);
        ln_exitCtx(pData->ctxln);
+       free(pData->pszPath);
 ENDfreeInstance
 
 
@@ -208,7 +211,7 @@ CODESTARTdoAction
        }
        es_deleteStr(str);
 
-       msgAddJSON(pMsg, (uchar*)"!", json);
+       msgAddJSON(pMsg, (uchar *)pData->pszPath + 1, json);
 
 ENDdoAction
 
@@ -218,12 +221,14 @@ setInstParamDefaults(instanceData *pData)
 {
        pData->rulebase = NULL;
        pData->bUseRawMsg = 0;
+       pData->pszPath = strdup("$!");
 }
 
 BEGINnewActInst
        struct cnfparamvals *pvals;
        int i;
        int bDestructPValsOnExit;
+       char *cstr;
 CODESTARTnewActInst
        DBGPRINTF("newActInst (mmnormalize)\n");
 
@@ -251,6 +256,22 @@ CODESTARTnewActInst
                        pData->rulebase = 
(uchar*)es_str2cstr(pvals[i].val.d.estr, NULL);
                } else if(!strcmp(actpblk.descr[i].name, "userawmsg")) {
                        pData->bUseRawMsg = (int) pvals[i].val.d.n;
+               } else if(!strcmp(actpblk.descr[i].name, "path")) {
+                       cstr = es_str2cstr(pvals[i].val.d.estr, NULL);
+                       if (strlen(cstr) < 2) {
+                               errmsg.LogError(0, RS_RET_VALUE_NOT_SUPPORTED,
+                                               "mmnormalize: valid path name 
should be at least "
+                                               "2 symbols long, got %s",       
cstr);
+                               free(cstr);
+                       } else if (cstr[0] != '$') {
+                               errmsg.LogError(0, RS_RET_VALUE_NOT_SUPPORTED,
+                                               "mmnormalize: valid path name 
should start with $,"
+                                               "got %s", cstr);
+                               free(cstr);
+                       } else {
+                               pData->pszPath = cstr;
+                       }
+                       continue;
                } else {
                        DBGPRINTF("mmnormalize: program error, non-handled "
                          "param '%s'\n", actpblk.descr[i].name);
-- 
1.7.9.5

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to