The big question is how the log file needs to be parsed / formatting. I'd be inclined to write a UDF that would take the line of text and return a tuple of the values you'd be storing in hbase.
Then you could do other operations on the bag of tuples that get passed back. Alternatively, you could write a regex statement and use an internal pig function like REGEX_EXTRACT or REGEX_EXTRACT_ALL. I like the UDF approach in this case because then I can more easily write unit tests around my log parser and get that testing out of the way before actually spawning any jobs. On Wed, Feb 26, 2014 at 12:22 AM, Chhaya Vishwakarma < [email protected]> wrote: > hi, > > I have a log file in HDFS which needs to be parsed and put in a Hbase > table. > > I want to do this using PIG . > > How can i go about it .Pig script should parse the logs and then put in > Hbase? > > > Regards, > Chhaya Vishwakarma > > > ________________________________ > The contents of this e-mail and any attachment(s) may contain confidential > or privileged information for the intended recipient(s). Unintended > recipients are prohibited from taking action on the basis of information in > this e-mail and using or disseminating the information, and must notify the > sender and delete it from their system. L&T Infotech will not accept > responsibility or liability for the accuracy or completeness of, or the > presence of any virus or disabling code in this e-mail" >
