Re: [rsyslog] liblognorm tokenize issue

David Lang Mon, 01 Jun 2015 14:27:12 -0700

On Fri, 29 May 2015, David Lang wrote:

On Fri, 29 May 2015, Rainer Gerhards wrote:

2015-05-29 13:53 GMT+02:00 singh.janmejay <[email protected]>:

Should we have an optional argument in word: except.

Eg.

%foo:word:&%%bar:word%

Given baz&quux will give us:

{"foo" : "baz", "bar": "quux"}

If we take multiple chars (allow escaped unicode sequences), we can
say the default value of this field is 'space' and 'tab'.


isn't that char-to? At least that was what I added char-to for...

almost, except char-to (and char-sep) don't allow you to specify multiplecharacters as the end of the item.


char-sep will fail if you have

a&b c

and do %foo:tokenize:%:char-sep:&% c

attached is a patch that lets you specify multiple characters for char-to andchar-sep, any one of the characters will work, so with the example above


rule=:%foo:tokenize:&:char-sep:& % c

# echo 'a&b c' |./lognormalizer -r del -e json

you get

{ "foo": [ "a", "b" ] }

David Lang

diff --git a/doc/configuration.rst b/doc/configuration.rst
index ec44f4d..2f87647 100644
--- a/doc/configuration.rst
+++ b/doc/configuration.rst
@@ -198,8 +198,8 @@ char-to
 ####### 
 
 One or more characters, up to the next character given in
-extra data. Additional data must contain exactly one character, which
-can be escaped.
+extra data. Additional data must contain one or more characters, which
+can be escaped. If multiple characters are given, any of them will match.
 
 ::
 
@@ -210,8 +210,9 @@ char-sep
 ########
 
 Zero or more characters, up to the next character given in extra data, or 
-up to end of line. Additional data must contain exactly one character, 
-which can be escaped.               
+up to end of line. Additional data must contain one or more characters,
+which can be escaped. If multiple characters are given, any of them will
+match.
 
 ::
 
diff --git a/src/parser.c b/src/parser.c
index a655355..4964d33 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -808,8 +808,8 @@ done:
 
 /**
  * Parse everything up to a specific character.
- * The character must be the only char inside extra data passed to the parser.
- * It is a program error if strlen(ed) != 1. It is considered a format error if
+ * If there are multiple characters in ed, all are checked as possible delimiters.
+ * It is considered a format error if
  * a) the to-be-parsed buffer is already positioned on the terminator character
  * b) there is no terminator until the end of the buffer
  * In those cases, the parsers declares itself as not being successful, in all
@@ -818,21 +818,32 @@ done:
 PARSER(CharTo)
 	const char *c;
 	unsigned char cTerm;
-	size_t i;
+	const char *toFind;
+	size_t i, j, k;
 
 	assert(str != NULL);
 	assert(offs != NULL);
 	assert(parsed != NULL);
-	assert(es_strlen(ed) == 1);
-	cTerm = *(es_getBufAddr(ed));
+	assert(ed != NULL);
+	k = es_strlen(ed);
+	toFind = es_str2cstr(ed, NULL);
+	cTerm = 0;
 	c = str;
 	i = *offs;
 
 	/* search end of word */
-	while(i < strLen && c[i] != cTerm) 
-		i++;
+	while(i < strLen && !cTerm) {
+                for (j=0;j < k; j++) {
+                        if (c[i] == toFind[j]) {
+                                cTerm = 1;
+                                break;
+                        }
+                }
+                if (!cTerm)
+		        i++;
+        }
 
-	if(i == *offs || i == strLen || c[i] != cTerm)
+	if(i == *offs || i == strLen || !cTerm)
 		goto done;
 
 	/* success, persist */
@@ -846,8 +857,7 @@ done:
 
 /**
  * Parse everything up to a specific character, or up to the end of string.
- * The character must be the only char inside extra data passed to the parser.
- * It is a program error if strlen(ed) != 1.
+ * If there are multiple characters in ed, all are checked as possible delimiters.
  * This parser always returns success.
  * By nature of the parser, it is required that end of string or the separator
  * follows this field in rule.
@@ -855,19 +865,30 @@ done:
 PARSER(CharSeparated)
 	const char *c;
 	unsigned char cTerm;
-	size_t i;
+	const char *toFind;
+	size_t i, j, k;
 
 	assert(str != NULL);
 	assert(offs != NULL);
 	assert(parsed != NULL);
-	assert(es_strlen(ed) == 1);
-	cTerm = *(es_getBufAddr(ed));
+	assert(ed != NULL);
+	k = es_strlen(ed);
+	toFind = es_str2cstr(ed, NULL);
+	cTerm = 0;
 	c = str;
 	i = *offs;
 
 	/* search end of word */
-	while(i < strLen && c[i] != cTerm) 
-		i++;
+	while(i < strLen && !cTerm) {
+                for (j=0;j < k; j++) {
+                        if (c[i] == toFind[j]) {
+                                cTerm = 1;
+                                break;
+                        }
+                }
+                if (!cTerm)
+		        i++;
+        }
 
 	/* success, persist */
 	*parsed = i - *offs;

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] liblognorm tokenize issue

Reply via email to