I am processing a data source containing log files. Each record contains a 
number of fields, but one field has a string value that is a dump of the log 
records for given day. The log entries contained within this string "blob" are 
not fixed length, but do follow a set pattern. I can break this blob down into 
individual log entries fairly easily using Java. The basic pattern is 
[<date>][<component>][<msgtype>] <message text> and a given blob may contain up 
to 100 such records. Can this be broken down using Pig? I'm looking for records 
containing specific message types to output.

Can someone point me to any examples where Pig is used to break down a string 
into substrings based on a pattern?
If I create a UDF using Java to break down the string into substrings, can I 
return the substrings as a list to Pig?
    If so, how do I iterate through the list in Pig?

Ron W.

Reply via email to