HI, You could use avro to get the records serialized, transfer over Flume's AVRO sink into HDFS and process the files with Hive. Since the log looks well formatted, it should be easy. http://flume.apache.org/FlumeDeveloperGuide.html => Avro RPC Client
Example: http://flume.apache.org/FlumeUserGuide.html => search for Avro cheers - Alex On Aug 30, 2012, at 12:18 PM, Manu Moncy K <[email protected]> wrote: > Tue Aug 7 00:00:00 2012 > User-Name = "xxxxxxxx" > NAS-Port = xxxxxxxx > NAS-IP-Address = xxxxxxxx > Framed-IP-Address = xxxxxxxx > Filter-Id = " xxxxxxxx " > Class = " xxxxxxxx " > NAS-Identifier = " xxxxxxxx " > Acct-Status-Type = xxxxxxxx > Acct-Delay-Time = 0 > Acct-Session-Id = " xxxxxxxx " > Acct-Authentic = RADIUS > Event-Timestamp = 1344286800 > NAS-Port-Type = Ethernet > Calling-Station-Id = " xxxxxxxx " > NAS-Port-Id = " xxxxxxxx " > Service-Type = Framed-User > Framed-Protocol = PPP > Acct-Link-Count = 0 > RB-Agent-Circuit-Id = " xxxxxxxx " > DSLForum-Agent-Circuit-Id = " xxxxxxxx " > DSLForum-Access-Loop-Encapsulation = "" > Timestamp = 1344286800 > OSC-Service-Identifier = "DSLUsers" > Proxy-State = OSC-Extended-Id=40682 > Timestamp = 1344286800 > > Tue Aug 7 00:00:00 2012 > User-Name = " xxxxxxxx " > NAS-Port = xxxxxxxx > NAS-IP-Address = xxxxxxxx > Framed-IP-Address = xxxxxxxx > Class = "44620232:04:" > NAS-Identifier = " xxxxxxxx " > Acct-Status-Type = Stop > Acct-Delay-Time = 0 > Acct-Input-Octets = 6021 > Acct-Output-Octets = 323749 > Acct-Session-Id = " xxxxxxxx " > Acct-Authentic = RADIUS > Acct-Session-Time = 1348 > Acct-Input-Packets = 53 > Acct-Output-Packets = 3187 > Acct-Terminate-Cause = User-Request > Acct-Input-Gigawords = 0 > Acct-Output-Gigawords = 0 > Event-Timestamp = 1344286800 > NAS-Port-Type = Ethernet > Calling-Station-Id = " xxxxxxxx " > NAS-Port-Id = " xxxxxxxx " > Service-Type = Framed-User > Framed-Protocol = PPP > Acct-Link-Count = 0 > Timestamp = 1344286800 > OSC-Service-Identifier = "DSLUsers" > Proxy-State = OSC-Extended-Id=24386 > Timestamp = 1344286800 > > > Above given log format (2 events given) is the RADIUS LOG I am working on, > I wanted to know if there is a way i can use flume and put this log into > hive in JSON format and take the required fields for each event. > -- > Manu K Moncy > Data Scientist > Flutura Business Solutions Pvt. Ltd > Electronics and Communication Engineering(2008-2012) > Govt. Model Engineering College, > Cochin - 21 > ☎: +91-9740245341 > ☎: +91-9895163190 > ✉: [email protected] > ✉: [email protected] -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF
