On Thu, 30 May 2013, Gary Foster wrote:
I'm assuming you mean "what's missing from mmnormalize", if that's not what
you meant I apologize in advance. What is "missing" would be the deep level
of regexp support I need. The regexp support is not nearly robust enough to
support what I've been trying to do.
Here's a sample log line of what I am parsing with logstash:
May 30 17:52:22 mediacast5 rg_events: 10.12.247.179 - - [30/May/2013:17:52:22
+0000] "GET
/events?rg_type=2.4.5:revenue&rg_player_type=standard&rg_publisher=HGTV&rg_publisher_id=1248&rg_domain_category_id=&rg_domain_id=d5e19628176ce4f2ae05a06a4bd9a2f1&rg_page_host_url=Scripting%20Error%20TypeError:%20Cannot%20read%20property%20'width'%20of%20null&rg_ad_domain_id=null&rg_player_uuid=24355d48-cda7-43e4-aabc-76e402764bea&rg_video_provider_id=603&rg_video_catalog_id=562&rg_video_index_id=26&rg_guid=68664b27-3510-48f4-a1be-d0d0b64d3115&rg_session=6305b824c217985075259a92b90542c1&rg_counter=1&rg_eve
nt=jwplayerReady&rg_category=Player&rg_action=Impression&rg_ads_version=rg_all_13.140.12.19_flex_sdk_3.6.0.16995_DOUBLECLICK&rg_comscore_version=13.140.12.19_flex_sdk_3.6.0.16995&rg_coordinates=null&rg_visible=null&rg_size=null
HTTP/1.1" 200 0 "http://<url elided>" "Mozilla/5.0 (Windows NT 6.1; WOW64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36"
I can easily use mmnormalize to parse out most of the large chunks of stuff.
However, when I get to this part:
GET
/events?rg_type=2.4.5:revenue&rg_player_type=standard&rg_publisher=HGTV&rg_publisher_id=1248&rg_domain_category_id=&rg_domain_id=d5e19628176ce4f2ae05a06a4bd9a2f1&rg_page_host_url=Scripting%20Error%20TypeError:%20Cannot%20read%20property%20'width'%20of%20n
ull&rg_ad_domain_id=null&rg_player_uuid=24355d48-cda7-43e4-aabc-76e402764bea&rg_video_pro0vider_id=603&rg_video_catalog_id=562&rg_video_index_id=26
(the rest snipped for brevity) I then need to break that up into discrete K/V
pairs. I want to end up with a structure that looks vaguely like this:
{"rg_type": "2.4.5:revenue", "rg_player_type": "standard",
"rg_publisher": "HGTV", "rg_action": "Impression", ?}
You get the idea? I basically need to split up all the params on the request
line into K/V pairs. Now, these values are in an arbitrary order. They are
also not always all there. The set of pairs is dynamic (we add and remove
various beacons as we conduct more experiments). Some of the fields are URL
encoded and need to be decoded as well.
Ok, just restating the problem to be sure I am understanding it correctly.
Currently mmnormalize has fairly poor support for name-value pairs. It has the
'iptables' datatype, which can handle name=value whitespace seperated values
(bad name for it, especially since it's the way to handle RFC5424 log messages
as well :-)
But in your case, the seperator between name=value sets is '&' instead of a
space.
It sounds like what's needed is a namevalue module that lets you specify
1. what the seperator character is between name-value pairs (default whitespace)
2. what the seperator character is inside name-value paris (default '=')
3. quote character (default '"')
4. escape character (default none)
do we need to support doublequote escaping "" means leave " inside the text?
5. root variable (default none), in the example above, you want something like
getparams{rg_type:....} instead of having rg_type as a top level variable.
6. separator character to terminate the match? do we need this, or do we just
put this character after the call to the namevalue datatype?
iptables would be a special case of this, URL parsing is common enough that it's
probably worth making a urlparse datatype with the appropriate defaults for it.
Gary, does something like this sound like what you are needing?
Rainer, I'm thinking that this should be fairly easy to implement (generalizing
the iptables datatype), does it sound like it to you?
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.