On Thu, 30 May 2013, Gary Foster wrote:

I'm assuming you mean "what's missing from mmnormalize", if that's not what you meant I apologize in advance. What is "missing" would be the deep level of regexp support I need. The regexp support is not nearly robust enough to support what I've been trying to do.

Here's a sample log line of what I am parsing with logstash:

May 30 17:52:22 mediacast5 rg_events: 10.12.247.179 - - [30/May/2013:17:52:22 +0000] "GET /events?rg_type=2.4.5:revenue&rg_player_type=standard&rg_publisher=HGTV&rg_publisher_id=1248&rg_domain_category_id=&rg_domain_id=d5e19628176ce4f2ae05a06a4bd9a2f1&rg_page_host_url=Scripting%20Error%20TypeError:%20Cannot%20read%20property%20'width'%20of%20null&rg_ad_domain_id=null&rg_player_uuid=24355d48-cda7-43e4-aabc-76e402764bea&rg_video_provider_id=603&rg_video_catalog_id=562&rg_video_index_id=26&rg_guid=68664b27-3510-48f4-a1be-d0d0b64d3115&rg_session=6305b824c217985075259a92b90542c1&rg_counter=1&rg_eve nt=jwplayerReady&rg_category=Player&rg_action=Impression&rg_ads_version=rg_all_13.140.12.19_flex_sdk_3.6.0.16995_DOUBLECLICK&rg_comscore_version=13.140.12.19_flex_sdk_3.6.0.16995&rg_coordinates=null&rg_visible=null&rg_size=null HTTP/1.1" 200 0 "http://<url elided>" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36"

I can easily use mmnormalize to parse out most of the large chunks of stuff. However, when I get to this part:

GET /events?rg_type=2.4.5:revenue&rg_player_type=standard&rg_publisher=HGTV&rg_publisher_id=1248&rg_domain_category_id=&rg_domain_id=d5e19628176ce4f2ae05a06a4bd9a2f1&rg_page_host_url=Scripting%20Error%20TypeError:%20Cannot%20read%20property%20'width'%20of%20n
ull&rg_ad_domain_id=null&rg_player_uuid=24355d48-cda7-43e4-aabc-76e402764bea&rg_video_pro0vider_id=603&rg_video_catalog_id=562&rg_video_index_id=26

(the rest snipped for brevity) I then need to break that up into discrete K/V pairs. I want to end up with a structure that looks vaguely like this:

{"rg_type": "2.4.5:revenue", "rg_player_type": "standard",
 "rg_publisher": "HGTV", "rg_action": "Impression", ?}

You get the idea? I basically need to split up all the params on the request line into K/V pairs. Now, these values are in an arbitrary order. They are also not always all there. The set of pairs is dynamic (we add and remove various beacons as we conduct more experiments). Some of the fields are URL encoded and need to be decoded as well.

Ok, just restating the problem to be sure I am understanding it correctly.

Currently mmnormalize has fairly poor support for name-value pairs. It has the 'iptables' datatype, which can handle name=value whitespace seperated values (bad name for it, especially since it's the way to handle RFC5424 log messages as well :-)

But in your case, the seperator between name=value sets is '&' instead of a space.

It sounds like what's needed is a namevalue module that lets you specify

1. what the seperator character is between name-value pairs (default whitespace)

2. what the seperator character is inside name-value paris (default '=')

3. quote character (default '"')

4. escape character (default none)

do we need to support doublequote escaping "" means leave " inside the text?

5. root variable (default none), in the example above, you want something like getparams{rg_type:....} instead of having rg_type as a top level variable.

6. separator character to terminate the match? do we need this, or do we just put this character after the call to the namevalue datatype?

iptables would be a special case of this, URL parsing is common enough that it's probably worth making a urlparse datatype with the appropriate defaults for it.

Gary, does something like this sound like what you are needing?

Rainer, I'm thinking that this should be fairly easy to implement (generalizing the iptables datatype), does it sound like it to you?

David Lang

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to