Re: Support for reading 4mc compressed text input files in apache crunch

Suyash Agarwal Mon, 21 May 2018 01:50:55 -0700

Hi Gabriel,

Ya, I was able to run mapreduce with 4mc compressed input.
I had to use a different input format class: https://github.com/
carlomedas/4mc/blob/master/java/hadoop-4mc/src/main/java/
com/hadoop/mapreduce/FourMcTextInputFormat.java


I am able to make it work in crunch by creating a different source
implementation.

public class FourMCInputSource<T> extends FileSourceImpl<T> implements
ReadableSource<T> {
  public FourMCInputSource(Path path, PType<T> ptype) {
          super(path, ptype, FourMcTextInputFormat.class);
  }
}

Not sure if this is the right way.

Thanks.


On Fri, May 18, 2018 at 8:04 PM, Gabriel Reid <[email protected]>
wrote:

> Hi Suyash,
>
> Could you post a bit more of your stack trace and information about
> which Hadoop version you're running on?
>
> Also, have you tried running a simple MapReduce job (e.g. word count)
> that operates on this file to ensure that 4mc-compression is working
> correctly on your cluster?
>
> - Gabriel
>
>
> On Wed, May 16, 2018 at 1:55 PM, Suyash Agarwal <[email protected]>
> wrote:
> > Hi,
> >
> > `new TextFileSource<>(<4mc compressed input>, strings())` fails with the
> > error:
> > java.lang.NullPointerException: null
> > at
> > com.hadoop.compression.fourmc.Lz4Decompressor.reset(
> Lz4Decompressor.java:234).
> >
> > And trying `(Class<? extends FileInputFormat<?, ?>>)
> > FourMcTextInputFormat.class` in From.formattedFile() as the format class
> > doesn't work with class cast exception.
> >
> > So, how can I read the 4mc compressed input file in Crunch?
> >
> > Thanks.
> >
>

Re: Support for reading 4mc compressed text input files in apache crunch

Reply via email to