Hello, I have a small cluster up and running with LZO compressed files in it. I'm using the lzo compression libraries available at http://github.com/kevinweil/hadoop-lzo (thank you for maintaining this!)
So far everything works fine when I write regular map-reduce jobs. I can read in lzo files and write out lzo files without any problem. I'm also using Pig 0.7 and it appears to be able to read LZO files out of the box using the default LoadFunc (PigStorage). However, I am currently testing a large LZO file (20GB) which I indexed using the LzoIndexer and Pig does not appear to be making use of the indexes. The pig scripts that I've run so far only have 3 mappers when processing the 20GB file. My understanding was that there should be 1 map for each block (256MB blocks) so about 80 mappers when processing the 20GB lzo file. Does Pig 0.7 support indexed lzo files with the default load function? If not, I was looking at elephant-bird and noticed it is only compatible with Pig 0.6 and not 0.7+ Is that accurate? What would be the recommended solution for processing index lzo files using Pig 0.7. Thank you for any assistance! ~Ed