Oh that really helps. My bad, didn't read that clearly. In fact, I'm already reading .gz files. But my concern was, will it be efficient to run the job without unzipping the .gz files- which might itself take some time to run for my input size.
I have around 20K input files each of the size ~250KB which are already in .gz format . Also am not storing it in HDFS, but reading directly from Local file system. So as to make this processing split across multiple files - should I decompress them and compress again with snappy utility before running them or run them directly as .gz input files ?
