Hello,

I have a small cluster up and running with LZO compressed files in it.  I'm
using the lzo compression libraries available at
http://github.com/kevinweil/hadoop-lzo (thank you for maintaining this!)

So far everything works fine when I write regular map-reduce jobs.  I can
read in lzo files and write out lzo files without any problem.

I'm also using Pig 0.7 and it appears to be able to read LZO files out of
the box using the default LoadFunc (PigStorage).  However, I am currently
testing a large LZO file (20GB) which I indexed using the LzoIndexer and Pig
does not appear to be making use of the indexes.  The pig scripts that I've
run so far only have 3 mappers when processing the 20GB file.  My
understanding was that there should be 1 map for each block (256MB blocks)
so about 80 mappers when processing the 20GB lzo file.  Does Pig 0.7 support
indexed lzo files with the default load function?

If not, I was looking at elephant-bird and noticed it is only compatible
with Pig 0.6 and not 0.7+  Is that accurate?  What would be the recommended
solution for processing index lzo files using Pig 0.7.

Thank you for any assistance!

~Ed

Reply via email to