That looks like the right solution to me, though I wouldn't mind seeing the stack trace for the ClassCastException for the formattedFile() call if you have it handy!
On Mon, May 21, 2018 at 1:49 AM, Suyash Agarwal <[email protected]> wrote: > Hi Gabriel, > > Ya, I was able to run mapreduce with 4mc compressed input. > I had to use a different input format class: https://github.com/carl > omedas/4mc/blob/master/java/hadoop-4mc/src/main/java/com/had > oop/mapreduce/FourMcTextInputFormat.java > > I am able to make it work in crunch by creating a different source > implementation. > > public class FourMCInputSource<T> extends FileSourceImpl<T> implements > ReadableSource<T> { > public FourMCInputSource(Path path, PType<T> ptype) { > super(path, ptype, FourMcTextInputFormat.class); > } > } > > Not sure if this is the right way. > > Thanks. > > > On Fri, May 18, 2018 at 8:04 PM, Gabriel Reid <[email protected]> > wrote: > >> Hi Suyash, >> >> Could you post a bit more of your stack trace and information about >> which Hadoop version you're running on? >> >> Also, have you tried running a simple MapReduce job (e.g. word count) >> that operates on this file to ensure that 4mc-compression is working >> correctly on your cluster? >> >> - Gabriel >> >> >> On Wed, May 16, 2018 at 1:55 PM, Suyash Agarwal <[email protected]> >> wrote: >> > Hi, >> > >> > `new TextFileSource<>(<4mc compressed input>, strings())` fails with the >> > error: >> > java.lang.NullPointerException: null >> > at >> > com.hadoop.compression.fourmc.Lz4Decompressor.reset(Lz4Decom >> pressor.java:234). >> > >> > And trying `(Class<? extends FileInputFormat<?, ?>>) >> > FourMcTextInputFormat.class` in From.formattedFile() as the format class >> > doesn't work with class cast exception. >> > >> > So, how can I read the 4mc compressed input file in Crunch? >> > >> > Thanks. >> > >> > >
