Hi, Several things you can try:
1)Try using com.twitter.elephantbird.pig.load.LzoPigStorage() and print out a few lines just to make sure you can read clear text from the lzo files. 2) You can use this in combination with pigs REGEX_EXTRACT(String expression, String regex, int matchIndex) built int function 3) Have you tried LzoRegexLoader(String pattern)? Cheers, Gerrit On Mon, Mar 21, 2011 at 9:11 PM, Saptarshi Guha <[email protected]>wrote: > Hello, > > I have some LZO files, which i > > a) indexed via DistributedLzoIndexer to create index files > b) did not index, so just some LZO files in a directory. > > Using both approaches, I tried creating a subclass LzoBaseRegexLoader > that returns a pattern. > Sadly, not a single line matched. This is not a problem of the regex > (checked it works with other strings), > i modified LzoBaseRegexLoader.java to print the strings coming in and > I'm getting binary e.g. > > http://pastebin.com/wAveGzDy > > I'm using Pig 0.8 and ElephantBird checked out from > https://github.com/gerritjvv/elephant-bird > > Any suggestions? > > Saptarshi >
