HI,

You could use avro to get the records serialized, transfer over Flume's AVRO 
sink into HDFS and process the files with Hive. Since the log looks well 
formatted, it should be easy.
http://flume.apache.org/FlumeDeveloperGuide.html => Avro RPC Client

Example:
http://flume.apache.org/FlumeUserGuide.html => search for Avro

cheers
- Alex


On Aug 30, 2012, at 12:18 PM, Manu Moncy K <[email protected]> 
wrote:

> Tue Aug  7 00:00:00 2012
>        User-Name = "xxxxxxxx"
>        NAS-Port = xxxxxxxx
>        NAS-IP-Address = xxxxxxxx
>        Framed-IP-Address = xxxxxxxx
>        Filter-Id = " xxxxxxxx "
>        Class = " xxxxxxxx "
>        NAS-Identifier = " xxxxxxxx "
>        Acct-Status-Type = xxxxxxxx
>        Acct-Delay-Time = 0
>        Acct-Session-Id = " xxxxxxxx "
>        Acct-Authentic = RADIUS
>        Event-Timestamp = 1344286800
>        NAS-Port-Type = Ethernet
>        Calling-Station-Id = " xxxxxxxx "
>        NAS-Port-Id = " xxxxxxxx "
>        Service-Type = Framed-User
>        Framed-Protocol = PPP
>        Acct-Link-Count = 0
>        RB-Agent-Circuit-Id = " xxxxxxxx "
>        DSLForum-Agent-Circuit-Id = " xxxxxxxx "
>        DSLForum-Access-Loop-Encapsulation = ""
>        Timestamp = 1344286800
>        OSC-Service-Identifier = "DSLUsers"
>        Proxy-State = OSC-Extended-Id=40682
>        Timestamp = 1344286800
> 
> Tue Aug  7 00:00:00 2012
>        User-Name = " xxxxxxxx "
>        NAS-Port = xxxxxxxx
>        NAS-IP-Address = xxxxxxxx
>        Framed-IP-Address = xxxxxxxx
>        Class = "44620232:04:"
>        NAS-Identifier = " xxxxxxxx "
>        Acct-Status-Type = Stop
>        Acct-Delay-Time = 0
>        Acct-Input-Octets = 6021
>        Acct-Output-Octets = 323749
>        Acct-Session-Id = " xxxxxxxx "
>        Acct-Authentic = RADIUS
>        Acct-Session-Time = 1348
>        Acct-Input-Packets = 53
>        Acct-Output-Packets = 3187
>        Acct-Terminate-Cause = User-Request
>        Acct-Input-Gigawords = 0
>        Acct-Output-Gigawords = 0
>        Event-Timestamp = 1344286800
>        NAS-Port-Type = Ethernet
>        Calling-Station-Id = " xxxxxxxx "
>        NAS-Port-Id = " xxxxxxxx "
>        Service-Type = Framed-User
>        Framed-Protocol = PPP
>        Acct-Link-Count = 0
>        Timestamp = 1344286800
>        OSC-Service-Identifier = "DSLUsers"
>        Proxy-State = OSC-Extended-Id=24386
>        Timestamp = 1344286800
> 
> 
> Above given log format (2 events given) is the RADIUS LOG I am working on,
> I wanted to know if there is a way i can use flume and put this log into
> hive in JSON format and take the required fields for each event.
> -- 
> Manu K Moncy
> Data Scientist
> Flutura Business Solutions Pvt. Ltd
> Electronics and Communication Engineering(2008-2012)
> Govt. Model Engineering College,
> Cochin - 21
> ☎: +91-9740245341
> ☎: +91-9895163190
> ✉: [email protected]
> ✉: [email protected]


--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Reply via email to