Re: Snappy Compression Json Data

Raghu Angadi Fri, 14 Oct 2011 17:13:20 -0700

if 'STORE' worked, LOAD should work fine too.

On Thu, Oct 13, 2011 at 6:29 PM, Cameron Gandevia <[email protected]>wrote:


> Hi
>
> I currently have a bunch of data in json format in hdfs. I would like to
> use
> pig to load it dedupe it and store it back using snappy compression.
>
> Currently I do something like this.
>
> raw = LOAD '$INPUT' USING PigJsonLoader();
> uniq = DISTINCT raw;
> STORE uniq INTO '$OUTPUT' USING PigStorage();
>
> If I add the following to the pig job it seems to write the files with a
> '.snappy' extension
>
> <property>
>  <name>mapred.output.compress</name>
>  <value>true</value>
> </property>
> <property>
>  <name>mapred.output.compression.codec</name>
>  <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>  </property>
>  <property>
>   <name>mapred.output.compression.type</name>
>   <value>BLOCK</value>
>  </property>
>
> Is this all I need to do? Or do I need to write it in a different format?
> and is there a way to load the snappy compressed json data or do I need to
> implement a new load function?
>
> any help is much appreciated.
>
> Thanks
>

Re: Snappy Compression Json Data

Reply via email to